Parsing a SQL spatial column in Python - sql

I am struggling a bit as I am new to programming. I am currently writing a python script and I am a bit stuck. The goal is to parse some spatial information the gets pulled from SQL to a format that is usable for my py script down the line.
I was able to CAST through a SQL query and fetchall using the obdc module. However once I fetch the data that is where it gets trick for me. Here is an example of a print from the fetchall:
[(u'POLYGON ((7014.186279296875 6602.99658203125 1612.5, 7015.984375 6600.416015625 1612.5))',), (u'POLYGON ((6730.962646484375 6715.2490234375 1522.5, 6730.0869140625 6714.13916015625 1522.5))',)]
I am not exactly sure what I am getting here it is like a list of tuples. which I have tried converting to a list of list, but there must be something I am missing.
Here is the usable format I am looking for:
[[7014.186279296875, 6602.99658203125, 1612.5], [7015.984375, 6600.416015625, 1612.5]]
[[6730.962646484375, 6715.2490234375, 1522.5], [6730.0869140625, 6714.13916015625, 1522.5]]
Any ideas of how I can accomplish this? Maybe there is a better way to CAST in SQL or a module in python that would be easier to use instead of just doing a cursor.fetchall() and parsing? Or any any parsing help would be useful. Thanks.

If you want to do parsing, that should be straight forward. For example you've provided next code would do the thing:
result = []
for element in data:
single_elements = element[0][10:-2].split(', ')
for se in single_elements:
row = str(se).split(' ')
result.append([float(a) for a in row])
Result will contain what you need. If parsing is not an option, then paste some of your code so I can see how you're fetching data.

Related

Dynamic, Nested Replace

I'm using SQL Server 2008 and need to strip out quite a bit of data within a string. Because of the nature and variability of the string, I think I'm needing to use multiple, nested REPLACE commands. The problem is each REPLACE needs to build on the previous one. Here is a sample of what I'm looking at:
<Paragraph><Replacement Id="40B"><Le><Run Foreground="#FFFF0000">Treatment by </Run></Le><Op isFreeText="True"><Run Foreground="#FFFF0000">test</Run></Op><Tr><Run Foreground="#FFFF0000">. </Run></Tr></Replacement></Paragraph>
Essentially, I need it to return just the text outside of the <> brackets so for this example it would be:
Treatment by test.
Also, I wanted to mention that the strings inside the <> brackets can vary quite a bit for each row both by content and length, but it isn't relevant for what I'm needing other than making it more complex for replacing.
Here is what I've tried:
REPLACE(note,substring(note,patindex('<%>',note),CHARINDEX('>',note) - CHARINDEX('<',note) + 1),'')
And it returns:
<Replacement Id="40B"><Le><Run Foreground="#FFFF0000">Treatment by </Run></Le><Op isFreeText="True"><Run Foreground="#FFFF0000">test</Run></Op><Tr><Run Foreground="#FFFF0000">. </Run></Tr></Replacement></Paragraph>
Somehow I need to keep going with replacing each of the <> brackets but don't know how to proceed. Any help or guidance would be greatly appreciated!!!
Depending on how you have that string holding the HTML fragment available you could try to use something like:
SELECT convert(xml, '<Paragraph><Replacement Id="40B"><Le><Run Foreground="#FFFF0000">Treatment by </Run></Le><Op isFreeText="True"><Run Foreground="#FFFF0000">test</Run></Op><Tr><Run Foreground="#FFFF0000">. </Run></Tr></Replacement></Paragraph>').value('/', 'varchar(255)') as stripped
You convert it to XML and then use the built in xml parser function "value".

Django: get result of raw SQL query in one line

This works:
connection = get_connection()
cursor=connection.cursor()
cursor.execute('show application_name')
application_name_of_connection=cursor.fetchone()[0]
But why four lines? Is there no way to get this in one line?
No, there is not a way to do this in one line. Languages such as Python are designed to describe a set of instructions. There are four instructions here. There may even be helper methods or clever/inefficient arrangements of these statements that could lower this to three lines, but you are best to keep it how it is.
For example, if you are for some reason using this code frequently, you would encapsulate it in its own method
def get_application_name_of_connection()
connection = get_connection()
cursor=connection.cursor()
cursor.execute('show application_name')
return cursor.fetchone()[0]
And then simply:
get_application_name_of_connection()
This is how it works. You want less code? Hide the functionality.
It can be done with comprehension but on sqlite:
results = [row[0] for row in get_connection().cursor().execute('select * from mytable')]
If you are using postgresql then it doesn't work, you could try this for a more compact form:
with get_connection().cursor() as cur:
cur.execute('show all')
print cur.fetchone()
But only if you need the cursor thrown away after the block execution.
I didn't tried on other db managers, results may vary.

OpenRefine remove duplicates from list with jython

I have a column with values that are duplicated e.g.
VMS5796,VMS5650,VMS5650,CSL,VMA5216,CSL,VMA5113
I'm applying a transform using jython that removes the duplicates (On error is set to keep original), here's the code:
return list(set(value.split(",")))
Which works in the preview, but isn't getting applied to the column. What am I doing wrong?
The Map function is very powerful and an underused function in Python / Jython. It probably is unclear what this code does internally, but it is extremely fast in processing millions of bits of values from a list or array in your columns cells' values that need to be 'mapped' as a string type and then applying a join with a separator char such as a comma ', '
deduped_list = list(set(value.split(",")))
return ', '.join(map(str, deduped_list))
There are probably other, even slightly faster variations than this, but this should get you going in the right direction.
Interestingly, you can also get the 'printable representation' repr(object) which is acceptable to an EVAL like OpenRefine's and can be useful for seeing the representation of your values as well..., which I just found out about, researching this answer in more depth for you.
deduped_list = list(set(value.split(",")))
return ', '.join(map(repr, deduped_list))
Preview implicitly formats things for display. Your expression returns an array (which can't be stored in a cell), so if you'd like to get it string form, tack a .join(',') on the end.

Compare 2 datasets with dbunit?

Currently I need to create tests for my application. I used "dbunit" to achieve that and now need to compare 2 datasets:
1) The records from the database I get with QueryDataSet
2) The expected results are written in the appropriate FlatXML in a file which I read in as a dataset as well
Basically 2 datasets can be compared this way.
Now the problem are columns with a Timestamp. They will never fit together with the expected dataset. I really would like to ignore them when comparing them, but it doesn't work the way I want it.
It does work, when I compare each table for its own with adding a column filter and ignoreColumns. However, this approch is very cumbersome, as many tables are used in that comparison, and forces one to add so much code, it eventually gets bloated.
The same applies for fields which have null-values
A probable solution would also be, if I had the chance to only compare the very first column of all tables - and not by naming it with its column name, but only with its column index. But there's nothing I can find.
Maybe I am missing something, or maybe it just doesn't work any other way than comparing each table for its own?
For the sake of completion some additional information must be posted. Actually my previously posted solution will not work at all as the process reading data from the database got me trapped.
The process using "QueryDataset" did read the data from the database and save it as a dataset, but the data couldn't be accessed from this dataset anymore (although I could see the data in debug mode)!
Instead the whole operation failed with an UnsupportedOperationException at org.dbunit.database.ForwardOnlyResultSetTable.getRowCount(ForwardOnlyResultSetTable.java:73)
Example code to produce failure:
QueryDataSet qds = new QueryDataSet(connection);
qds.addTable(“specificTable”);
qds.getTable(„specificTable“).getRowCount();
Even if you try it this way it fails:
IDataSet tmpDataset = connection.createDataSet(tablenames);
tmpDataset.getTable("specificTable").getRowCount();
In order to make extraction work you need to add this line (the second one):
IDataSet tmpDataset = connection.createDataSet(tablenames);
IDataSet actualDataset = new CachedDataSet(tmpDataset);
Great, that this was nowhere documented...
But that is not all: now you'd certainly think that one could add this line after doing a "QueryDataSet" as well... but no! This still doesn't work! It will still throw the same Exception! It doesn't make any sense to me and I wasted so much time with it...
It should be noted that extracting data from a dataset which was read in from an xml file does work without any problem. This annoyance just happens when trying to get a dataset directly from the database.
If you have done the above you can then continue as below which compares only the columns you got in the expected xml file:
// put in here some code to read in the dataset from the xml file...
// and name it "expectedDataset"
// then get the tablenames from it...
String[] tablenames = expectedDataset.getTableNames();
// read dataset from database table using the same tables as from the xml
IDataSet tmpDataset = connection.createDataSet(tablenames);
IDataSet actualDataset = new CachedDataSet(tmpDataset);
for(int i=0;i<tablenames.length;i++)
{
ITable expectedTable = expectedDataset.getTable(tablenames[i]);
ITable actualTable = actualDataset.getTable(tablenames[i]);
ITable filteredActualTable = DefaultColumnFilter.includedColumnsTable(actualTable, expectedTable.getTableMetaData().getColumns());
Assertion.assertEquals(expectedTable,filteredActualTable);
}
You can also use this format:
// Assert actual database table match expected table
String[] columnsToIgnore = {"CONTACT_TITLE","POSTAL_CODE"};
Assertion.assertEqualsIgnoreCols(expectedTable, actualTable, columnsToIgnore);

JSON filter (like SQL SELECT)

I have a json file/stream, i like to be able to make select SQL style
so here is the file
the file contain all the data i have, I'll like to be able to show, let said :
all the : odeu_nom and odeu_desc that is : categorie=Feuilles
if you can do that with PHP and json (eval) fine... tell me how...
on the other part in sql i will do : SELECT * from $json where categorie=Feuilles
p.s. i have found : jsonpath that is a xpath for json... maybe another option ?
p.s. #2... with some research, i have found anoter option, the json is the same as a array, maybe I can filter the array and just return the one i need ?... how do i do that ?
It makes more sense to try and stick with XPath-style selectors (like jsonpath), rather than using SQL, even if you are more familiar with SQL.
The advantage of the "path" is that it is more readily expressive of the hierarchical structure implicit to XML/JSON, as opposed to SQL which requires using various joins to help it "get out of its rectangular/tabular prison".
Although I never used jsonpath, by reading its summary page, I believe that the following should produce all the odeu_nom for objects which catagorie is 'Feuilles' (given the json input referred in the question).
$.Liste_des_odeurs[?(#.categorie = 'Feuilles'].odeu_nom
which correspond to the following XPath
/Liste_des_odeurs[categorie='Feuilles']/odeu_nom
Et voila...
BTW, 'Jazz is not dead, it just smells funny' (F Zappa)