4. API reference¶
These are the modules in maturity order:
xleash |
A mini-language for “throwing the rope” around rectangular areas of Excel-sheets. |
mappings |
Hierarchical string-like objects useful for indexing, that can be rename/relocated at a later stage. |
pandata |
A pandas-model is a tree of strings, numbers, sequences, dicts, pandas instances and resolvable URI-references, implemented by Pandel . |
components |
Defines the building-blocks of a “model”: |
4.1. Module: pandalone.xleash
¶
A mini-language for “throwing the rope” around rectangular areas of Excel-sheets.
4.1.1. About¶
Any decent dataset is stored in csv. Consequently, many datasets are still trapped in excel-sheets.
XLeash defines a url-fragment notation (xl-ref) that renders the capturing of tables from sheets as practical as reading a csv, even when the exact position of those tables are not known beforehand.
An additional goal is to apply the same lassoing operation recursively, to build data-trees. For that end, the syntax supports filter transformations such as:
- setting the dimensionality of the result tables,
- creating higher-level objects from 2D capture-rect (dictionaries, numpy-arrays & dataframes).
It is based on xlrd library but also checked for compatibility with xlwings COM-client library. It requires numpy and (optionally) pandas. Since 2019 it is python-3 only, tested on 3.5+.
4.1.2. Overview¶
The xl-ref notation extends ordinary A1 and RC excel coordinates with
conditional traversing operations, based on the cell’s empty/full state.
For instance, to extract a contigious table near the A1
cell,
and make a pandas.DataFrame
out of it use this:
from pandalone import xleash, SheetsFactory
shfac = SheetsFactory()
shfac.list_sheetnames(''path/to/workbook.xlsx')
[Sheet1', ...]
## Search and capture the first contiguous table from the 1st sheet
# as a pandas-DataFrame:
df = xleash.lasso('path/to/workbook.xlsx#0!A1(DR):..(DR):RLDU:["df"]',
sheets_factory=shfac)
## Assuming the sheet contain a single table, a lone `:` fetches
# the same contents. Additionally, it is possible
# to skip the sheetname/sheet-index (1st 1st sheet implied).
df = xleash.lasso('#:["df"]',
url_file=path/to/workbook.xlsx,
sheets_factory=shfac)
4.1.2.1. Xl-ref Syntax¶
[<url>]#[<sheet>!][<1st-edge>][:[<2nd-edge>][:<expansions>]][:<filters>]
- See edge, expansion-moves, filters for details.
- Missing edges are implicitly replaced by
^^:__
(top-left/bottom-right). - Spaces are allowed only in filters.
4.1.2.2. Annotated Example¶
target-moves─────┐
landing-cell──┐ │
┌┤ ┌┤
#C3(UL):..(RD):RULD:["pipe": ["odict", "recursive"]]
└─┬──┘ └─┬──┘ └┬─┘ └──────────────┬───────────────┘
1st-edge───────┘ │ │ │
2nd-edge──────────────┘ │ │
expansions──────────────────┘ │
filters────────────────────────────────────────┘
Which means:
- Target the 1st edge of the capture-rect by starting from
C3
landing-cell. If it is a full-cell, stop, otherwise start moving above and to the left ofC3
and stop on the first full-cell;- continue from the last target and travel the exterior row and column right and down, stopping on their last full-cell;
- capture all the cells between the 2 targets.
- try expansions to all directions if any neighbouring full-cell;
- finally filter the values of the capture-rect to wrap them up in an ordered- dictionary, and dive into its values searching for xl-ref, and replace them.
4.1.2.3. Basic Usage¶
The simplest way to lasso a xl-ref is through lasso()
.
A common task is to capture all non-empty cells of the 1st workbook-sheet but
without any bordering nulls:
>>> from pandalone import xleash
>>> values = xleash.lasso('path/to/workbook.xlsx#:')
Assuming that the full-cell of the 1st sheet of the workbook on disk are
those marked with 'X'
, then the result capture-rect of the above call
would be a 2D list-of-lists with the values contained in C2:E4
:
A B C D E
1 ┌─────┐
2 │ X│
3 │X │
4 │ X │
5 └─────┘
If another sheet is desired, add its name or 0-based ordinal immediately after #
separated by a !
with the rest of the xl-ref - which inthat case
might be empty:
>>> lasso = xleash.lasso
>>> lasso('Book.xlsx#Sheet1!') == lasso('Book.xlsx#0!') == lasso('Book.xlsx#:')
True
If you do not wish to let the library read your workbooks, you can
invoke the function with a pre-loaded sheet.
Here we will use the utility ArraySheet
with a more complicated
xl-ref expression:
>>> sheet = xleash.ArraySheet([[None, None, 'A', None],
... [None, 2.2, 'foo', None],
... [None, None, 2, None],
... [None, None, None, 3.14],
... ])
>>> xleash.lasso('#A1(DR):..(DR):RULD', sheet=sheet)
[[None, 'A'],
[2.2, 'foo'],
[None, 2]]
This capture-rect in this case was B1 and C3 as can be seen by inspecting
the st
and nd
fields of the full Xlref
results returned:
>>> xleash.lasso('#A1(DR):..(DR):RULD', sheet=sheet, return_lasso=True)
Lasso(xl_ref='#A1(DR):..(DR):RULD',
url_file=None,
sh_name=None,
st_edge=Edge(land=Cell(row='1', col='A'), mov='DR', mod=None),
nd_edge=Edge(land=Cell(row='.', col='.'), mov='DR', mod=None),
exp_moves='RULD',
call_spec=None,
sheet=ArraySheet(SheetId(book='wb', ids=['sh', 0]),
[[None None 'A' None]
[None 2.2 'foo' None]
[None None 2 None]
[None None None 3.14]]),
st=Coords(row=0, col=1),
nd=Coords(row=2, col=2),
values=[[None, 'A'],
[2.2, 'foo'],
[None, 2]],
base_coords=None,
...
For controlling explicitly the configuration parameters and the opening of
workbooks, use separate instances of Ranger
and SheetsFactory
,
that are the workhorses of this library:
>>> with xleash.SheetsFactory() as sf:
... sf.add_sheet(sheet, wb_ids='foo_wb', sh_ids='Sheet1')
... ranger = xleash.Ranger(sf, base_opts={'verbose': True})
... ranger.do_lasso('foo_wb#Sheet1!__').values
3.14
Notice that it returned a scalar value since we specified only the 1st edge
as '__'
, which points to the bottom row and most-left column of the sheet.
Alternatively you can call the make_default_Ranger()
for extending
library’s defaults.
4.1.2.4. More Syntax Examples¶
Another typical but more advanced case is when a sheet contains a single table with a “header”-row and a “index”-column. There are (at least) 3 ways to do it, beyond specifying the exact coordinates:
A B C D E
1 ┌───────┐ Β2:E4 ## Exact referencing.
2 │ X X X│ ^^.__ or : ## From top-left full-cell to bottom-right.
3 │X X X X│ A1(DR):__:U1 ## Start from A1 and move down and right
3 │X X X X│ # until B3; capture till bottom-left;
4 │X X X X│ # expand once upwards (to header row).
└───────┘ A1(RD):__:L1 ## Start from A1 and move down by row
# until C1; capture till bottom-left;
# expand once left (to index column).
Note that if B1
were full, the results would still be the same, because
?
expands only if any full-cell found in row/column.
In case where the sheet contains more than one disjoint tables, the bottom-left cell of the sheet would not coincide with table-end, so the handy last two xl-ref above would not work.
For that we may resort to dependent referencing for the 2nd edge, and define its position in relation to the 1st target:
A B C D E
1 ┌─────┐ _^:..(LD+):L1 ## Start from top-right(E2) and target left
2 │ X X│ # left(D2); from there capture left-down
3 │X X X│ # till 1st empty-cell(C4, regardless of
4 │X X X│ # col/row order); expand left once.
└─────┘ ^_(U):..(UR):U1 ## Start from B5 and target 1st cell up;
5 Χ # capture from there till D3; expand up.
In the presence of empty-cell breaking the exterior row/column of the 1st landing-cell, the capturing becomes more intricate:
A B C D E
1 ┌─────┐ Β2:D_
2 │ X X│ A1(RD):..(RD):L1D
3 │X X │ D_:^^
3 │X │ A^(DR):D_:U
4 │ X │X
└─────┘
A B C D E
1 ┌───┐ ^^(RD):..(RD)
2 │X X│ _^(R):^.(DR)
3 X│X │
└───┘
3 X
4 X X
A B C D E
1 ┌───┐ Β2:C4
2 │ X│X A1(RD):^_
3 │X X│ C_:^^
3 │X │ A^(DR):C_:U
4 │ X│ X ^^(RD):..(D):D
└───┘ D2(L+):^_
See also
Example spreadsheet: xleash.xlsx
4.1.3. Definitions¶
- lasso
- lassoing
It may denote 3 things:
- the whole procedure of parsing the xl-ref syntax, capturing values from spreadsheet rect-regions and sending them through any filters specified in the xl-ref;
- the
lasso()
and/orRanger.do_lasso()
functions performing the above job; - the
Lasso
storing intermediate and final results of the above algorithm.
- xl-ref
Any url with its fragment abiding to the syntax defined herein.
- The fragment describes how to capture rects from excel-sheets, and it is composed of 2 edge references followed by expansions and filters.
- The file-part should resolve to an excel-file.
- parse
- parsing
- The stage where the input string gets splitted and checked for validity against the xl-ref syntax.
- edge
An edge might signify:
- the syntactic construct of the xl-ref, composed of a pair
of row/column coordinates, optionally followed by parenthesized
target-moves, like
A1(LU)
; - the bounding cells of the target-rect;
- the bounding cells of the capture-rect.
- the syntactic construct of the xl-ref, composed of a pair
of row/column coordinates, optionally followed by parenthesized
target-moves, like
- 1st
- 2nd
It may refer to the 1st/2nd:
- edge of some xl-ref;
- landing-cell of an edge;
- target-cell of an edge;
- capture-cell of a capture-rect.
The 1st-edge` supports `absolute` `coordinates` only, while the *2nd-edge supports also dependent ones from the 1st target-cell.
- landing-cell
- The cell identified by the coordinates of the edge alone.
- target-cell
- target-rect
- The bounding cell identified after applying target-moves on the landing-cell.
- target
- targeting
The process of identifying any target-cell bounding the target-rect.
- The search for the target-cell starts from the landing-cell, follows the specified target-moves, and ends when a state-change is detected on an exterior column or row, according to the enacted termination-rule.
- Failure to identify any target-cell raises a
EmptyCaptureException
which is subsequently translated as empty capture-rect byRanger
when opts contain{"no_empty": false}
(default). - The process is followed by expansions to identify the capture-rect.
Note that in the case of a dependent 2nd edge, the target-rect would always be the same, irrespective of whether target-moves denoted a row-by-row or column-by-column traversal.
- capture
- capturing
It is the overall procedure of:
- targeting both edge refs to come up with the target-rect;
- performing expansions to identify the capture-rect;
- extracting the values and feed them to filters.
- capture-rect
- capture-cell
- The rectangular-area of the sheet denoted by the two capture-cells identified by capturing, that is, after applying expansions on target-rect.
- directions
- The 4 primitive directions that are denoted with one of the letters
LURD
. Thee are used to express both target-moves and expansions. - coordinate
- coordinates
- Any pair of a cell/column coordinates specifying cell positions,
(i.e. landing-cell, target-cell, bounds of the capture-rect)
written as the first part of the edge syntax, or implicitely resolved.
They can be expressed in
A1
orRC
format or as a zero-based(row, col)
tuple (num). Each coordinate might be absolute or dependent, independently. - traversing
- traversal-operations
- Either the target-moves or the expansion-moves that comprise the capturing.
- target-moves
Specify the cell traversing order while targeting using primitive directions pairs. The pairs
UD
andLR
(and their inverse) are invalid. I.e.DR
means:“Start going right, column-by-column, traversing each column from top to bottom.”- move-modifier
- One of
+
and-
chars that might trail the target-moves and define which the termination-rule to follow if landing-cell is full-cell, i.e.A1(RD+)
- expansions
- expansion-moves
Due to state-change on the ‘exterior’ cells the capture-rect might be smaller that a wider contigious but “convex” rectangular area.
The expansions attempt to remedy this by providing for expanding on arbitrary directions accompanied by a multiplicity for each one. If multiplicity is unspecified, infinite assumed, so it expands until an empty/full row/column is met.
- absolute
Any cell row/col identified with column-characters, row-numbers, or the following special-characters:
^
The top/Left full-cell coordinate._
The bottom/right full-cell coordinate.
- dependent
- base-cell
A landing-cell whose any coordinate is identified with a dot(
.
), which resolves to the base-coordinate depending on which edge it is referring to:- 1st edge: The coordinates of the base-cell field of the
Lasso given to the
Ranger.do_lasso()
; must not beNone
. - 2nd edge: the target-cell coordinates of the 1st edge.
An edge might contain a “mix” of absolute and dependent coordinates.
- 1st edge: The coordinates of the base-cell field of the
Lasso given to the
- state
- full-cell
- empty-cell
- full-cell
- A cell is full when it is not empty / blank (in Excel’s parlance).
- states-matrix
- A boolean matrix denoting the state of the cells, having the same size as a sheet it was derived from.
- state-change
- Whether we are traversing from an empty-cell to a full-cell, and vice-versa, while targeting.
- termination-rule
The condition to stop targeting while traversing from landing-cell. The are 2 rules: search-same and search-opposite.
See also
Check Target-termination enactment for the enactment of the rules.
- search-opposite
- The target-cell is the FIRST full-cell found while traveling from the landing-cell according to the target-moves.
- search-same
- The coordinates of the target-cell are given by the LAST full-cell on the exterior column/row according to the target-moves; the order of the moves is insignificant in that case.
- exterior
- The column and the row of the landing-cell; the search-same termination-rule gets to be triggered by ‘full-cells’ only on them.
- filter
- filters
- The last part of the xl-ref specifying predefined functions to apply for transforming the cell-values of capture-rect, abiding to the json syntax. They may be bulk or element-wise.
- bulk
- bulk-filter
- A filter treating capture-rect values as a whole , i.e. transposing arrays, is_empty
- element-wise
- element-wise-filter
- A filter diving into capture-rect values, i.e. for python-eval.
- call-specifier
- call-spec
The structure to specify some function call in the filter part; it can either be a json string, list or object like that:
- string:
"func_name"
- list:
["func_name", ["arg1", "arg2"], {"k1": "v1"}]
where the last 2 parts are optional and can be given in any order; - object:
{"func": "func_name", "args": ["arg1"], "kwds": {"k":"v"}}
where theargs
andkwds
are optional.
If the outer-most filter is a dictionary, a
'pop'
kwd is popped-out as the opts.- string:
- opts
- Key-value pairs affecting the lassoing (i.e. opening xlrd-workbooks).
Read the code to be sure what are the available choices :-(
They are a combination of options specified in code (i.e. in the
lasso()
and those extracted from filters by the ‘opts’ key, and they are stored in theLasso
. - backend
- backends
IO level object providing the actual spreadsheet cells for capturing. Each backend may provide for its workbooks and sheets corresponding to: - different implementations (e.g.``xlrd`` or
xlwings
library), or - different origins (e.g. file-based, network-based per url ).The decision which backend to use is taken by the sheets-factory following a bidding process.
- sheets-factory
- IO level object acting as the caching manager for spreadsheets fetched from different backends. The caching happens per spreadsheet.
- bid
- bidding
- backend-bidding
- bidding
- All backends are asked to provide their willingness to handle
some xl-ref (see
SimpleSheetFactory.decide_backend()
)). For a sibling sheet, always the parent backend is used. - sheet
- spreadsheet
- IO level object that acts as the container of cells.
4.1.4. Details¶
4.1.4.1. Target-moves¶
There are 12 target-moves named with a single or a pair of
letters denoting the 4 primitive directions, LURD
:
U
UL◄───┐▲┌───►UR
LU │││ RU
▲ │││ ▲
│ │││ │
└─────┼│┼─────┘
L◄──────X──────►R
┌─────┼│┼─────┐
│ │││ │
▼ │││ ▼
LD │││ RD
DL◄───┘▼└───►DR
D
- The 'X' at the center points the starting cell.
So a RD
move means “traverse cells first by rows then by columns”,
or more lengthy description would be:
“Start moving *right till 1st state change, and then move down to the next row, and start traversing right again.”*
4.1.4.2. Target-cells¶
Using these moves we can identify a target-cell in relation to
the landing-cell. For instance, given this xl-sheet below, there are
multiple ways to identify (or target) the non-empty values X
, below:
A B C D E F
1
2
3 X ──────► C3 A1(RD) _^(L) F3(L)
4 X ──────► E4 A4(R) _4(L) D1(DR)
5 X ──────► B5 A1(DR) A_(UR) _5(L)
6 X ──────► F6 __ _^(D) A_(R)
- The 'X' signifies non-empty cells.
So we can target cells with “absolute coordinates”, the usual A1
notation,
augmented with the following special characters:
- undesrcore(
_
) for bottom/right, and- accent(
^
) for top/left
columns/rows of the sheet with non-empty values.
When no LURD
moves are specified, the target-cell coinceds with the starting one.
See also
Target-termination enactment section
4.1.4.3. Capturing¶
To specify a complete capture-rect we need to identify a 2nd cell. The 2nd target-cell may be specified:
In the above example-sheet, here are some ways to specify refs:
A B C D E F
1
2
┌─────┐
┌──┼─┐ │
3 │ │X│ │
│┌─┼─┼───┼┐
4 ││ │ │ X││
││ └─┼───┴┼───► C3:E4 A1(RD):..(RD) _^(L):..(DR) _4(L):A1(RD)
5 ││X │ │
│└───┼────┴───► B4:E5 A_(UR):..(RU) _5(L):1_(UR) E1(D):A.(DR)
6 │ │ X
└────┴────────► Β3:C6 A1(RD):^_ ^^:C_ C_:^^
Warning
Of course, the above rects WILL FAIL since the target-moves
will stop immediately due to X
values being surrounded by empty-cells.
But the above diagram was to just convey the general idea. To make it work, all the in-between cells of the peripheral row and columns should have been also non-empty.
Note
The capturing moves from 1st target-cell to 2nd target-cell are independent from the implied target-moves in the case of dependent coords.
More specifically, the capturing will always fetch the same values
regardless of “row-first” or “column-first” order; this is not the case
with targeting (LURD
) moves.
For instance, to capture B4:E5
in the above sheet we may use
_5(L):E.(U)
.
In that case the target cells are B5
and E4
and the target-moves
to reach the 2nd one are UR
which are different from the U
specified on the 2nd cell.
4.1.4.4. Target-termination enactment¶
The guiding principle for when to enact each rule is to always capture a matrix of full-cell.
- If the landing-cell is empty-cell, always search-opposite, that is, stop on the first full-cell.
- When the landing-cell is full-cell, it depends on the ‘move-modifier’:
- If
+
exists, apply search-same. - If
-
exists, stop on landing-cell. - If no modifier, behave like
`-` (stop on `landing-cell`) except when on a `2nd` edge with both its coordinates `dependent` (
..``), where the search-same is applied
- If
So, both move-modifier apply only when landing-cell is full-cell
, and -
actually makes sense only when 2nd edge is dependent.
If the termination conditions is not met, an EmptyCaptureException
is raised, which is translated as empty capture-rect by Ranger
when opts contain {"no_empty": false}
(default).
4.1.4.5. Expansions¶
Captured-rects (“values”) may be limited due to empty-cell in the 1st
row/column traversed. To overcome this, the xl-ref may specify expansions
directions using a 3rd :
-section like that:
_5(L):1_(UR):RDL1U1
This particular case means:
“Try expanding Right and Down repeatedly and then try once Left and Up.”
Expansion happens on a row-by-row or column-by-column basis, and terminates when a full empty(or non-empty) line is met.
Example-refs are given below for capturing the 2 marked tables:
A B C D E F G
1
┌───────────┐
│┌─────────┐│
2 ││ 1 X X ││
││ ││
3 ││X X X X││
││ ││
4 ││X X X 2 X││
││ ││
5 ││X X X X││
└┼─────────┼┴──► A1(RD):..(RD):DRL1
6 │X │
└─────────┴───► A1(RD):..(RD):L1DR A_(UR):^^(RD)
7 X
- The 'X' signify non-empty cells.
- The '1' and '2' signify the identified target-cells.
4.1.5. Plugin Extensions¶
The xleash library already uses setuptools entry-points
to attach backends and pandas filters.
Read init_plugins()
to learn how to implement other plugins.
4.1.6. API¶
User-facing higher-level functionality:
Lasso
(xl_ref, url_file, sh_name, st_edge, …)All the fields used by the algorithm, populated stage-by-stage by Ranger
.lasso
(xlref[, sheets_factory, base_opts, …])High-level function to lasso around spreadsheet’s rect-regions according to xl-ref strings by using internally a Ranger
.Ranger
(sheets_factory[, base_opts, …])The director-class that performs all stages required for “throwing the lasso” around rect-values. Ranger.do_lasso
(xlref, **context_kwds)The director-method that does all the job of hrowing a lasso around spreadsheet’s rect-regions according to xl-ref. make_default_Ranger
([sheets_factory, …])Makes a defaulted Ranger
.get_default_opts
([overrides])Default opts used by lasso()
when constructing its internalRanger
.Related to capturing algorithm:
resolve_capture_rect
(states_matrix, …[, …])Performs targeting, capturing and expansions based on the states-matrix. coords2Cell
(row, col)Make A1 Cell
from resolved coords, with rudimentary error-checking.EmptyCaptureException
Thrown when targeting fails. xlwings_dims_call_spec
()A list call-spec for _redim_filter()
filter that imitates results of xlwings library.Related to parsing and basic structure used throughout: .. currentmodule:: pandalone.xleash._parse .. autosummary:
parse_xlref parse_expansion_moves parse_call_spec Cell Coords Edge
IO back-end functionality:
backend.SheetsFactory
([backends])A caching-store of ABCSheet
instances, serving them based on (workbook, sheet) IDs, optionally creating them from backends.backend.ABCBackend
A plugin for a backend must implement and add instances into io_backends
.backend.ABCBackend.open_sheet
(wb_url, sheet_id)Open a ABCSheet
subclass, if backend has won the bid.backend.ABCSheet.read_rect
(st, nd)Fecth the actual values from the backend Excel-sheet. backend.ArraySheet
(arr[, ids, ids])A sample ABCSheet
made out of 2D-list or numpy-arrays, for facilitating tests.backend.ABCSheet
A delegating to backend factory and sheet-wrapper with utility methods. _xlrd.XlrdSheet
(sheet, book_fname[, epoch1904])The xlrd workbook wrapper required by xleash library. _xlrd._open_sheet_by_name_or_index
(…)param int or str or None sheet_id: Plugin related .. autosummary:
_init_plugins _plugins_installed _PLUGIN_GROUP_NAME io_backends installed_filters
-
pandalone.xleash.
_init_plugins
(plugin_group_name='pandalone.xleash.plugins')[source]¶ Discover and load plugins.
The xleash library already uses setuptools entry-points to attach backend
Sheet
and pandasfilters
.You may re-invoke after some
pip install <some-xleash-plugin>
.##
setup.py
configurationsTo implement a new plugin, you have to package your code as a regular python distribution and add the following declaration inside its
setup.py
:setup( # ... entry_points = { 'pandalone.xleash.plugins': [ 'plugin_1 = <foo.plugin.module>:<plugin-install-func> ## Load & install. 'plugin_2 = <bar.plugin.module> ## Load only. ] } )
## Implementing a plugin
The plugins are initialized during import time in a 2-stage procedure by
init_plugins()
. A plugin is loaded and optionally installed if the setup-configuration above specifies a no-args<plugin-install-func>
callable. Any collected<plugin-install-func>
callables are invoked AFTER all plugin-modules have finished loading.Tip
For example, study this project how it sets backend and filters.
Warning
When appending into “hook” lists during installation, remember to avoid re-inserting duplicate items. In general try to well-behave even when plugins are initialized multiple times!
-
pandalone.xleash.
_PLUGIN_GROUP_NAME
= 'pandalone.xleash.plugins'¶ Used to discover setuptools extension-points.
-
pandalone.xleash.
resolve_capture_rect
(states_matrix, up_dn_margins, st_edge, nd_edge=None, exp_moves=None, base_coords=None)[source]¶ Performs targeting, capturing and expansions based on the states-matrix.
To get the margin_coords, use one of:
Its results can be fed into
read_capture_values()
.Parameters: - states_matrix (np.ndarray) – A 2D-array with
False
wherever cell are blank or empty. UseABCSheet.get_states_matrix()
to derrive it. - Coords) up_dn_margins ((Coords,) – the top-left/bottom-right coords with full-cells
- st_edge (Edge) – “uncooked” as matched by regex
- nd_edge (Edge) – “uncooked” as matched by regex
- or none exp_moves (list) – Just the parsed string, and not
None
. - base_coords (Coords) – The base for a dependent 1st edge.
Returns: a
(Coords, Coords)
with the 1st and 2nd capture-cell ordered from top-left –> bottom-right.Return type: Raises: EmptyCaptureException – When targeting failed, and no target cell identified.
- Examples::
>>> from pandalone.xleash import Edge, margin_coords_from_states_matrix
>>> states_matrix = np.array([ ... [0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 0, 0], ... [0, 0, 0, 1, 1, 1], ... [0, 0, 1, 0, 0, 1], ... [0, 0, 1, 1, 1, 1] ... ], dtype=bool) >>> up, dn = margin_coords_from_states_matrix(states_matrix)
>>> st_edge = Edge(Cell('1', 'A'), 'DR') >>> nd_edge = Edge(Cell('.', '.'), 'DR') >>> resolve_capture_rect(states_matrix, (up, dn), st_edge, nd_edge) (Coords(row=3, col=2), Coords(row=4, col=2))
Using dependenent coordinates for the 2nd edge:
>>> st_edge = Edge(Cell('_', '_'), None) >>> nd_edge = Edge(Cell('.', '.'), 'UL') >>> rect = resolve_capture_rect(states_matrix, (up, dn), st_edge, nd_edge) >>> rect (Coords(row=2, col=2), Coords(row=4, col=5))
Using sheet’s margins:
>>> st_edge = Edge(Cell('^', '_'), None) >>> nd_edge = Edge(Cell('_', '^'), None) >>> rect == resolve_capture_rect(states_matrix, (up, dn), st_edge, nd_edge) True
Walking backwards:
>>> st_edge = Edge(Cell('^', '_'), 'L') # Landing is full, so 'L' ignored. >>> nd_edge = Edge(Cell('_', '_'), 'L', '+') # '+' or would also stop. >>> rect == resolve_capture_rect(states_matrix, (up, dn), st_edge, nd_edge) True
- states_matrix (np.ndarray) – A 2D-array with
-
class
pandalone.xleash.
ABCSheet
[source]¶ Bases:
abc.ABC
A delegating to backend factory and sheet-wrapper with utility methods.
Parameters: - _states_matrix (np.ndarray) – The states-matrix cached, so recreate object to refresh it.
- _margin_coords (dict) – limits used by
_resolve_cell()
, cached, so recreate object to refresh it.
Resource management is outside of the scope of this class, and must happen in the backend workbook/sheet instance.
xlrd examples:
>>> import xlrd >>> with xlrd.open_workbook(self.tmp) as wb: ... sheet = xleash.xlrdSheet(wb.sheet_by_name('Sheet1')) ... ## Do whatever
win32 examples:
>>> with dsgdsdsfsd as wb: ... sheet = xleash.win32Sheet(wb.sheet['Sheet1']) TODO: Win32 Sheet example
-
_read_margin_coords
()[source]¶ Override if possible to read (any of the) limits directly from the sheet.
Returns: the 2 coords of the top-left & bottom-right full cells; anyone coords can be None. By default returns (None, None)
.Return type: (Coords, Coords) Raise: EmptyCaptureException if sheet empty
-
_read_states_matrix
()[source]¶ Read the states-matrix of the wrapped sheet.
Returns: A 2D-array with False
wherever cell are blank or empty.Return type: ndarray
-
get_margin_coords
()[source]¶ Extract (and cache) margins either internally or from
margin_coords_from_states_matrix()
.Returns: the resolved top-left and bottom-right xleash.Coords
Return type: tuple Raise: EmptyCaptureException if sheet empty
-
get_sheet_ids
()[source]¶ Returns: a 2-tuple of its wb-name and a sheet-ids of this sheet i.e. name & indx Return type: SheetId or None
-
get_states_matrix
()[source]¶ Read and cache the states-matrix of the wrapped sheet.
Returns: A 2D-array with False
wherever cell are blank or empty.Return type: ndarray Raise: EmptyCaptureException if sheet empty
-
read_rect
(st, nd)[source]¶ Fecth the actual values from the backend Excel-sheet.
Parameters: Returns: - Depends on whether both coords are given:
- If both given, 2D list-lists with the values of the rect, which might be empty if beyond limits.
- If only 1st given, the scalar value, and if beyond margins, raise error!
Return type: Raise: EmptyCaptureException (optionally) if sheet empty
-
class
pandalone.xleash.
ArraySheet
(arr, ids=SheetId(book='wb', ids=['sh', 0]))[source]¶ Bases:
pandalone.xleash.io.backend.ABCSheet
A sample
ABCSheet
made out of 2D-list or numpy-arrays, for facilitating tests.-
__init__
(arr, ids=SheetId(book='wb', ids=['sh', 0]))[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
_read_states_matrix
()[source]¶ Read the states-matrix of the wrapped sheet.
Returns: A 2D-array with False
wherever cell are blank or empty.Return type: ndarray
-
get_sheet_ids
()[source]¶ Returns: a 2-tuple of its wb-name and a sheet-ids of this sheet i.e. name & indx Return type: SheetId or None
-
read_rect
(st, nd)[source]¶ Fecth the actual values from the backend Excel-sheet.
Parameters: Returns: - Depends on whether both coords are given:
- If both given, 2D list-lists with the values of the rect, which might be empty if beyond limits.
- If only 1st given, the scalar value, and if beyond margins, raise error!
Return type: Raise: EmptyCaptureException (optionally) if sheet empty
-
-
pandalone.xleash.
coords2Cell
(row, col)[source]¶ Make A1
Cell
from resolved coords, with rudimentary error-checking.Examples:
>>> coords2Cell(row=0, col=0) Cell(row='1', col='A') >>> coords2Cell(row=0, col=26) Cell(row='1', col='AA') >>> coords2Cell(row=10, col='.') Cell(row='11', col='.') >>> coords2Cell(row=-3, col=-2) Traceback (most recent call last): AssertionError: negative row!
-
exception
pandalone.xleash.
EmptyCaptureException
[source]¶ Bases:
Exception
Thrown when targeting fails.
-
pandalone.xleash.
margin_coords_from_states_matrix
(states_matrix)[source]¶ Returns top-left/bottom-down margins of full cells from a state matrix.
May be used by
ABCSheet.get_margin_coords()
if a backend does not report the sheet-margins internally.Parameters: states_matrix (np.ndarray) – A 2D-array with False
wherever cell are blank or empty. UseABCSheet.get_states_matrix()
to derrive it.Returns: the 2 coords of the top-left & bottom-right full cells Return type: (Coords, Coords) - Examples::
>>> states_matrix = np.asarray([ ... [0, 0, 0], ... [0, 1, 0], ... [0, 1, 1], ... [0, 0, 1], ... ]) >>> margins = margin_coords_from_states_matrix(states_matrix) >>> margins (Coords(row=1, col=1), Coords(row=3, col=2))
Note that the botom-left cell is not the same as
states_matrix
matrix size:>>> states_matrix = np.asarray([ ... [0, 0, 0, 0], ... [0, 1, 0, 0], ... [0, 1, 1, 0], ... [0, 0, 1, 0], ... [0, 0, 0, 0], ... ]) >>> margin_coords_from_states_matrix(states_matrix) == margins True
-
pandalone.xleash.
lasso
(xlref, sheets_factory=None, base_opts=None, available_filters=None, return_lasso=False, **context_kwds)[source]¶ High-level function to lasso around spreadsheet’s rect-regions according to xl-ref strings by using internally a
Ranger
.Parameters: - xlref (str) –
a string with the xl-ref format:
<url_file>#<sheet>!<1st_edge>:<2nd_edge>:<expand><js_filt>
i.e.:
file:///path/to/file.xls#sheet_name!UPT8(LU-):_.(D+):LDL1{"dims":1}
- sheets_factory – Factory of sheets from where to parse rect-values; if unspecified,
the new
SheetsFactory
created is closed afterwards. Delegated tomake_default_Ranger()
, so items override default ones; use a newRanger
if that is not desired. - available_filters (dict or None) – Delegated to
make_default_Ranger()
, so items override default ones; use a newRanger
if that is not desired. - return_lasso (bool) –
If
True
, values are contained in the returned Lasso instance, along with all other artifacts of the lassoing procedure.For more debugging help, create a
Range
yourself and inspect theRanger.intermediate_lasso
. - context_kwds (Lasso) – Default
Lasso
fields in case parsed ones areNone
(i.e. you can specify the sheet like that).
Variables: base_opts – Opts affecting the lassoing procedure that are deep-copied and used as the base-opts for every
Ranger.do_lasso()
, whether invoked directly or recursively byrecursive_filter()
. Read the code to be sure what are the available choices. Delegated tomake_default_Ranger()
, so items override default ones; use a newRanger
if that is not desired.Returns: Either the captured & filtered values or the final
Lasso
, depending on thereturn_lassos
arg.Example:
sheet = _
- xlref (str) –
-
class
pandalone.xleash.
Ranger
(sheets_factory, base_opts=None, available_filters=None)[source]¶ Bases:
object
The director-class that performs all stages required for “throwing the lasso” around rect-values.
Use it when you need to have total control of the procedure and configuration parameters, since no defaults are assumed.
The
do_lasso()
does the job.Variables: - sheets_factory (SheetsFactory) – Factory of sheets from where to parse rect-values; does not
close it in the end.
Maybe
None
, butdo_lasso()
will scream unless invoked with acontext_lasso
arg containing a concreteABCSheet
. - base_opts (dict) – The opts that are deep-copied and used as the defaults
for every
do_lasso()
, whether invoked directly or recursively byrecursive_filter()
. If unspecified, no opts are used, but this attr is set to an empty dict. Seeget_default_opts()
. - or None available_filters (dict) – The filters available for a xl-ref to use.
If
None
, then usesxleash.installed_filters
. Use an empty dict not to use any filters. - intermediate_lasso (Lasso) – A
('stage', Lasso)
pair with the lastLasso
instance produced during the last execution of thedo_lasso()
. Used for inspecting/debuging.
-
__init__
(sheets_factory, base_opts=None, available_filters=None)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
_make_init_Lasso
(**context_kwds)[source]¶ Creates the lasso to be used for each new
do_lasso()
invocation.
-
_parse_and_merge_with_context
(xlref, init_lasso)[source]¶ Merges xl-ref parsed-parsed_fields with
init_lasso
, reporting any errors.Parameters: init_lasso (Lasso) – Default values to be overridden by non-nulls. Returns: a Lasso with any non None
parsed-fields updated
-
_resolve_capture_rect
(lasso, sheet)[source]¶ Also handles
EmptyCaptureException
in caseopts['no_empty'] != False
.
- sheets_factory (SheetsFactory) – Factory of sheets from where to parse rect-values; does not
close it in the end.
Maybe
-
class
pandalone.xleash.
SheetsFactory
(backends=None)[source]¶ Bases:
pandalone.xleash.io.backend.SimpleSheetsFactory
A caching-store of
ABCSheet
instances, serving them based on (workbook, sheet) IDs, optionally creating them from backends.Variables: _cached_sheets (dict) – A cache of all _Spreadsheets accessed so far, keyed by multiple keys generated by _derive_sheet_keys()
.- To avoid opening non-trivial workbooks, use the
add_sheet()
to pre-populate this cache with them. - It is a resource-manager for contained sheets, so it can be used wth
a
with
statement.
-
__init__
(backends=None)[source]¶ Parameters: backends – The list of backends
to consider when opening sheets. If it evaluates to false,io_backends
assumed.Typ backends: list or None
-
_derive_sheet_keys
(sheet, wb_ids=None, sh_ids=None)[source]¶ Retuns the product of user-specified and sheet-internal keys.
Parameters: - wb_ids – a single or a sequence of extra workbook-ids (ie: file, url)
- sh_ids – a single or sequence of extra sheet-ids (ie: name, index, None)
- To avoid opening non-trivial workbooks, use the
-
pandalone.xleash.
io_backends
= [<pandalone.xleash.io._xlrd.XlrdBackend object>]¶ Hook for plugins to append
ABCBackend
instances.
-
pandalone.xleash.
make_default_Ranger
(sheets_factory=None, base_opts=None, available_filters=None)[source]¶ Makes a defaulted
Ranger
.Parameters: - sheets_factory – Factory of sheets from where to parse rect-values; if unspecified,
a new
SheetsFactory
is created. Remember to invoke itsSheetsFactory.close()
to clear resources from any opened sheets. - base_opts (dict or None) –
Default opts to affect the lassoing, to be merged with defaults; uses
get_default_opts()
.Read the code to be sure what are the available choices :-(.
- available_filters (dict or None) – The filters available for a xl-ref to use.
(
xleash.installed_filters
used if unspecified).
For instance, to make you own sheets-factory and override options, yoummay do this:
>>> from pandalone import xleash >>> with xleash.SheetsFactory() as sf: ... xleash.make_default_Ranger(sf, base_opts={'lax': True}) <pandalone.xleash._lasso.Ranger object at ...
- sheets_factory – Factory of sheets from where to parse rect-values; if unspecified,
a new
-
class
pandalone.xleash.
XLocation
(sheet, st, nd, base_coords)¶ Bases:
tuple
Fields denoting the position of a sheet/cell while running a element-wise-filter.
Practically func:
run_filter_elementwise() preserves these fields if the processed ones were `None
.-
__getnewargs__
()¶ Return self as a plain tuple. Used by copy and pickle.
-
static
__new__
(_cls, sheet, st, nd, base_coords)¶ Create new instance of XLocation(sheet, st, nd, base_coords)
-
__repr__
()¶ Return a nicely formatted representation string
-
_asdict
()¶ Return a new OrderedDict which maps field names to their values.
-
classmethod
_make
(iterable)¶ Make a new XLocation object from a sequence or iterable
-
_replace
(**kwds)¶ Return a new XLocation object replacing specified fields with new values
-
base_coords
¶ Alias for field number 3
-
nd
¶ Alias for field number 2
-
sheet
¶ Alias for field number 0
-
st
¶ Alias for field number 1
-
-
pandalone.xleash.
get_default_opts
(overrides=None)[source]¶ Default opts used by
lasso()
when constructing its internalRanger
.Parameters: or None overrides (dict) – Any items to update the default ones.
-
pandalone.xleash.
installed_filters
= {'df': {'func': <function _df_filter>}, 'dict': {'desc': "dict() -> new empty dictionary\ndict(mapping) -> new dictionary initialized from a mapping object's\n (key, value) pairs\ndict(iterable) -> new dictionary initialized as if via:\n d = {}\n for k, v in iterable:\n d[k] = v\ndict(**kwargs) -> new dictionary initialized with the name=value pairs\n in the keyword argument list. For example: dict(one=1, two=2)", 'func': <function install_default_filters.<locals>.<lambda>>}, 'numpy': {'desc': "array(object, dtype=None, copy=True, order='K', subok=False, ndmin=0)\n\n Create an array.\n\n Parameters\n ----------\n object : array_like\n An array, any object exposing the array interface, an object whose\n __array__ method returns an array, or any (nested) sequence.\n dtype : data-type, optional\n The desired data-type for the array. If not given, then the type will\n be determined as the minimum type required to hold the objects in the\n sequence.\n copy : bool, optional\n If true (default), then the object is copied. Otherwise, a copy will\n only be made if __array__ returns a copy, if obj is a nested sequence,\n or if a copy is needed to satisfy any of the other requirements\n (`dtype`, `order`, etc.).\n order : {'K', 'A', 'C', 'F'}, optional\n Specify the memory layout of the array. If object is not an array, the\n newly created array will be in C order (row major) unless 'F' is\n specified, in which case it will be in Fortran order (column major).\n If object is an array the following holds.\n\n ===== ========= ===================================================\n order no copy copy=True\n ===== ========= ===================================================\n 'K' unchanged F & C order preserved, otherwise most similar order\n 'A' unchanged F order if input is F and not C, otherwise C order\n 'C' C order C order\n 'F' F order F order\n ===== ========= ===================================================\n\n When ``copy=False`` and a copy is made for other reasons, the result is\n the same as if ``copy=True``, with some exceptions for `A`, see the\n Notes section. The default order is 'K'.\n subok : bool, optional\n If True, then sub-classes will be passed-through, otherwise\n the returned array will be forced to be a base-class array (default).\n ndmin : int, optional\n Specifies the minimum number of dimensions that the resulting\n array should have. Ones will be pre-pended to the shape as\n needed to meet this requirement.\n\n Returns\n -------\n out : ndarray\n An array object satisfying the specified requirements.\n\n See Also\n --------\n empty_like : Return an empty array with shape and type of input.\n ones_like : Return an array of ones with shape and type of input.\n zeros_like : Return an array of zeros with shape and type of input.\n full_like : Return a new array with shape of input filled with value.\n empty : Return a new uninitialized array.\n ones : Return a new array setting values to one.\n zeros : Return a new array setting values to zero.\n full : Return a new array of given shape filled with value.\n\n\n Notes\n -----\n When order is 'A' and `object` is an array in neither 'C' nor 'F' order,\n and a copy is forced by a change in dtype, then the order of the result is\n not necessarily 'C' as expected. This is likely a bug.\n\n Examples\n --------\n >>> np.array([1, 2, 3])\n array([1, 2, 3])\n\n Upcasting:\n\n >>> np.array([1, 2, 3.0])\n array([ 1., 2., 3.])\n\n More than one dimension:\n\n >>> np.array([[1, 2], [3, 4]])\n array([[1, 2],\n [3, 4]])\n\n Minimum dimensions 2:\n\n >>> np.array([1, 2, 3], ndmin=2)\n array([[1, 2, 3]])\n\n Type provided:\n\n >>> np.array([1, 2, 3], dtype=complex)\n array([ 1.+0.j, 2.+0.j, 3.+0.j])\n\n Data-type consisting of more than one element:\n\n >>> x = np.array([(1,2),(3,4)],dtype=[('a','<i4'),('b','<i4')])\n >>> x['a']\n array([1, 3])\n\n Creating an array from sub-classes:\n\n >>> np.array(np.mat('1 2; 3 4'))\n array([[1, 2],\n [3, 4]])\n\n >>> np.array(np.mat('1 2; 3 4'), subok=True)\n matrix([[1, 2],\n [3, 4]])", 'func': <function install_default_filters.<locals>.<lambda>>}, 'odict': {'desc': 'Dictionary that remembers insertion order', 'func': <function install_default_filters.<locals>.<lambda>>}, 'pipe': {'func': <function pipe_filter>}, 'py': {'func': <function py_filter>}, 'pyeval': {'func': <function pyeval_filter>}, 'recurse': {'func': <function recursive_filter>}, 'redim': {'func': <function redim_filter>}, 'sorted': {'desc': 'Return a new list containing all items from the iterable in ascending order.\n\nA custom key function can be supplied to customize the sort order, and the\nreverse flag can be set to request the result in descending order.', 'func': <function install_default_filters.<locals>.<lambda>>}, 'sr': {'desc': 'Converts a 2-columns list-of-lists into pd.Series.\n\n One-dimensional ndarray with axis labels (including time series).\n\n Labels need not be unique but must be a hashable type. The object\n supports both integer- and label-based indexing and provides a host of\n methods for performing operations involving the index. Statistical\n methods from ndarray have been overridden to automatically exclude\n missing data (currently represented as NaN).\n\n Operations between Series (+, -, /, *, **) align values based on their\n associated index values-- they need not be the same length. The result\n index will be the sorted union of the two indexes.\n\n Parameters\n ----------\n data : array-like, Iterable, dict, or scalar value\n Contains data stored in Series.\n\n .. versionchanged :: 0.23.0\n If data is a dict, argument order is maintained for Python 3.6\n and later.\n\n index : array-like or Index (1d)\n Values must be hashable and have the same length as `data`.\n Non-unique index values are allowed. Will default to\n RangeIndex (0, 1, 2, ..., n) if not provided. If both a dict and index\n sequence are used, the index will override the keys found in the\n dict.\n dtype : str, numpy.dtype, or ExtensionDtype, optional\n Data type for the output Series. If not specified, this will be\n inferred from `data`.\n See the :ref:`user guide <basics.dtypes>` for more usages.\n copy : bool, default False\n Copy input data.\n ', 'func': <function install_filters.<locals>.<lambda>>}}¶ Hook for plugins to append filters.
-
class
pandalone.xleash.
Lasso
(xl_ref, url_file, sh_name, st_edge, nd_edge, exp_moves, call_spec, sheet, st, nd, values, base_coords, opts)¶ Bases:
tuple
All the fields used by the algorithm, populated stage-by-stage by
Ranger
.Parameters: - xl_ref (str) – The full url, populated on parsing.
- sh_name (str) –
Parsed sheet name (or index, but still as string), populated on parsing.
Note
If you need the name of the captured sheet, use:
lasso.sheet.get_sheet_ids().ids[0]
- st_edge (Edge) – The 1st edge, populated on parsing.
- nd_edge (Edge) – The 2nd edge, populated on parsing.
- st (Coords) – The top-left targeted coords of the capture-rect, populated on capturing.`
- nd (Coords) – The bottom-right targeted coords of the capture-rect, populated on capturing
- sheet (ABCSheet) – The fetched from factory or ranger’s current sheet, populated after capturing before reading.
- values – The excel’s table-values captured by the lasso, populated after reading updated while applying filters.
- call_spec – The call-spec derrived from the parsed filters, to be fed
into
Ranger.make_call()
. - base_coords (Coords) – On recursive calls it becomes the base-cell for the 1st edge.
- or ChainMap opts (dict) –
- Before
parsing
, they are just any ‘opts’ dict found in the filters. - After parsing, a 2-map ChainMap with :attr:`Ranger.base_opts` and options extracted from *filters on top.
- Before
-
__getnewargs__
()¶ Return self as a plain tuple. Used by copy and pickle.
-
static
__new__
(_cls, xl_ref=None, url_file=None, sh_name=None, st_edge=None, nd_edge=None, exp_moves=None, call_spec=None, sheet=None, st=None, nd=None, values=None, base_coords=None, opts=None)¶ Create new instance of Lasso(xl_ref, url_file, sh_name, st_edge, nd_edge, exp_moves, call_spec, sheet, st, nd, values, base_coords, opts)
-
__repr__
()¶ Return a nicely formatted representation string
-
_asdict
()¶ Return a new OrderedDict which maps field names to their values.
-
classmethod
_make
(iterable)¶ Make a new Lasso object from a sequence or iterable
-
_replace
(**kwds)¶ Return a new Lasso object replacing specified fields with new values
-
base_coords
¶ Alias for field number 11
-
call_spec
¶ Alias for field number 6
-
exp_moves
¶ Alias for field number 5
-
nd
¶ Alias for field number 9
-
nd_edge
¶ Alias for field number 4
-
opts
¶ Alias for field number 12
-
sh_name
¶ Alias for field number 2
-
sheet
¶ Alias for field number 7
-
st
¶ Alias for field number 8
-
st_edge
¶ Alias for field number 3
-
url_file
¶ Alias for field number 1
-
values
¶ Alias for field number 10
-
xl_ref
¶ Alias for field number 0
-
pandalone.xleash.
xlwings_dims_call_spec
()[source]¶ A list call-spec for
_redim_filter()
filter that imitates results of xlwings library.
-
class
pandalone.xleash.
Cell
[source]¶ Bases:
pandalone.xleash._parse.Cell
A pair of 1-based strings, denoting the “A1” coordinates of a cell.
The “num” coords (numeric, 0-based) are specified using numpy-arrays (
Coords
).
-
class
pandalone.xleash.
Coords
(row, col)¶ Bases:
tuple
A pair of 0-based integers denoting the “num” coordinates of a cell.
The “A1” coords (1-based coordinates) are specified using
Cell
.-
__getnewargs__
()¶ Return self as a plain tuple. Used by copy and pickle.
-
static
__new__
(_cls, row, col)¶ Create new instance of Coords(row, col)
-
__repr__
()¶ Return a nicely formatted representation string
-
_asdict
()¶ Return a new OrderedDict which maps field names to their values.
-
classmethod
_make
(iterable)¶ Make a new Coords object from a sequence or iterable
-
_replace
(**kwds)¶ Return a new Coords object replacing specified fields with new values
-
col
¶ Alias for field number 1
-
row
¶ Alias for field number 0
-
-
class
pandalone.xleash.
Edge
[source]¶ Bases:
pandalone.xleash._parse.Edge
All the infos required to target a cell.
An Edge contains A1
Cell
asland
.Parameters: - land (Cell) – the landing-cell
- mov (str) – use None for missing moves.
- mod (str) – one of (
+
,-
orNone
)
-
class
pandalone.xleash.
CallSpec
(func, args, kwds)¶ Bases:
tuple
The call-specifier for holding the parsed json-filters.
-
__getnewargs__
()¶ Return self as a plain tuple. Used by copy and pickle.
-
static
__new__
(_cls, func, args=[], kwds={})¶ Create new instance of CallSpec(func, args, kwds)
-
__repr__
()¶ Return a nicely formatted representation string
-
_asdict
()¶ Return a new OrderedDict which maps field names to their values.
-
classmethod
_make
(iterable)¶ Make a new CallSpec object from a sequence or iterable
-
_replace
(**kwds)¶ Return a new CallSpec object replacing specified fields with new values
-
args
¶ Alias for field number 1
-
func
¶ Alias for field number 0
-
kwds
¶ Alias for field number 2
-
4.1.7. Submodule: pandalone.xleash._parse
¶
The syntax-parsing part xleash.
Prefer accessing the public members from the parent module.
-
class
pandalone.xleash._parse.
CallSpec
(func, args, kwds)¶ Bases:
tuple
The call-specifier for holding the parsed json-filters.
-
__getnewargs__
()¶ Return self as a plain tuple. Used by copy and pickle.
-
static
__new__
(_cls, func, args=[], kwds={})¶ Create new instance of CallSpec(func, args, kwds)
-
__repr__
()¶ Return a nicely formatted representation string
-
_asdict
()¶ Return a new OrderedDict which maps field names to their values.
-
classmethod
_make
(iterable)¶ Make a new CallSpec object from a sequence or iterable
-
_replace
(**kwds)¶ Return a new CallSpec object replacing specified fields with new values
-
args
¶ Alias for field number 1
-
func
¶ Alias for field number 0
-
kwds
¶ Alias for field number 2
-
-
class
pandalone.xleash._parse.
Cell
[source]¶ Bases:
pandalone.xleash._parse.Cell
A pair of 1-based strings, denoting the “A1” coordinates of a cell.
The “num” coords (numeric, 0-based) are specified using numpy-arrays (
Coords
).
-
class
pandalone.xleash._parse.
Edge
[source]¶ Bases:
pandalone.xleash._parse.Edge
All the infos required to target a cell.
An Edge contains A1
Cell
asland
.Parameters: - land (Cell) – the landing-cell
- mov (str) – use None for missing moves.
- mod (str) – one of (
+
,-
orNone
)
-
pandalone.xleash._parse.
Edge_new
(row, col, mov=None, mod=None, default=None)[source]¶ Make a new
Edge
from any non-values supplied, as is capitalized, or nothing.Parameters: Returns: a
Edge
if any non-NoneReturn type: Examples:
>>> Edge_new('1', 'a', 'Rul', '-') Edge(land=Cell(row='1', col='A'), mov='RUL', mod='-') >>> print(Edge_new('5', '5')) R5C5
No error checking performed:
>>> Edge_new('Any', 'foo', 'BaR', '+_&%') Edge(land=Cell(row='ANY', col='FOO'), mov='BAR', mod='+_&%') >>> print(Edge_new(None, None, None, None)) None
except were coincidental:
>>> Edge_new(row=0, col=123, mov='BAR', mod=None) Traceback (most recent call last): AttributeError: 'int' object has no attribute 'upper' >>> Edge_new(row=0, col='A', mov=123, mod=None) Traceback (most recent call last): AttributeError: 'int' object has no attribute 'upper'
-
pandalone.xleash._parse.
_excel_str_translator
= {8220: 34, 8221: 34}¶ Excel use these !@#% chars for double-quotes, which are not valid JSON-strings!!
-
pandalone.xleash._parse.
_parse_xlref
(xlref)[source]¶ Parse a xl-ref into a dict.
Parameters: xlref (str) – A url-string abiding to the xl-ref syntax. Returns: A dict with all fields, with None with those missing. Return type: dict Examples:
>>> res = parse_xlref('workbook.xlsx#Sheet1!A1(DR+):Z20(UL):L1U2R1D1:' ... '{"opts":{}, "func": "foo"}') >>> sorted(res.items()) [('call_spec', CallSpec(func='foo', args=[], kwds={})), ('exp_moves', 'L1U2R1D1'), ('nd_edge', Edge(land=Cell(row='20', col='Z'), mov='UL', mod=None)), ('opts', {}), ('sh_name', 'Sheet1'), ('st_edge', Edge(land=Cell(row='1', col='A'), mov='DR', mod='+')), ('url_file', 'workbook.xlsx'), ('xl_ref', 'workbook.xlsx#Sheet1!A1(DR+):Z20(UL):L1U2R1D1:{"opts":{}, "func": "foo"}')]
Shortcut for all sheet from top-left to bottom-right full-cells:
>>> res=parse_xlref('#:') >>> sorted(res.items()) [('call_spec', None), ('exp_moves', None), ('nd_edge', Edge(land=Cell(row='_', col='_'), mov=None, mod=None)), ('opts', None), ('sh_name', None), ('st_edge', Edge(land=Cell(row='^', col='^'), mov=None, mod=None)), ('url_file', None), ('xl_ref', '#:')]
Errors:
>>> parse_xlref('A1(DR)Z20(UL)') Traceback (most recent call last): SyntaxError: No fragment-part (starting with '#'): A1(DR)Z20(UL) >>> parse_xlref('#A1(DR)Z20(UL)') ## Missing ':'. Traceback (most recent call last): SyntaxError: Not an `xl-ref` syntax: A1(DR)Z20(UL)
But as soon as syntax is matched, subsequent errors raised are
ValueErrors
:>>> parse_xlref("#A1:B1:{'Bad_JSON_str'}") Traceback (most recent call last): ValueError: Filters are not valid JSON: Expecting property name enclosed in double quotes: line 1 column 2 (char 1) JSON: {'Bad_JSON_str'}
-
pandalone.xleash._parse.
_regular_xlref_regex
= re.compile('\n ^\\s*(?:(?P<sh_name>[^!]+)?!)? # xl sheet name\n (?: # 1st-edge\n (?:\n (?:\n , re.IGNORECASE|re.DOTALL|re.VERBOSE)¶ The regex for parsing regular xl-ref.
-
pandalone.xleash._parse.
_repeat_moves
(moves, times=None)[source]¶ Returns an iterator that repeats
moves
xtimes
, or infinite if unspecified.Used when parsing primitive directions.
Parameters: Returns: An iterator of the moves
Return type: iterator
Examples:
>>> list(_repeat_moves('LUR', '3')) ['LUR', 'LUR', 'LUR'] >>> list(_repeat_moves('ABC', '0')) [] >>> _repeat_moves('ABC') ## infinite repetitions repeat('ABC')
-
pandalone.xleash._parse.
parse_call_spec
(call_spec_values)[source]¶ Parse call-specifier from json-filters.
Parameters: call_spec_values – This is a non-null structure specifying some function call in the
filter
part, which it can be either:- string:
"func_name"
- list:
["func_name", ["arg1", "arg2"], {"k1": "v1"}]
where the last 2 parts are optional and can be given in any order; - object:
{"func": "func_name", "args": ["arg1"], "kwds": {"k":"v"}}
where theargs
andkwds
are optional.
Returns: the 3-tuple func, args=(), kwds={}
with the defaults as shown when missing.- string:
-
pandalone.xleash._parse.
parse_expansion_moves
(exp_moves)[source]¶ Parse rect-expansion into a list of dir-letters iterables.
Parameters: exp_moves – A string with a sequence of primitive moves: es. L1U1R1D1 Returns: A list of primitive-dir chains. Return type: list Examples:
>>> res = parse_expansion_moves('lu1urd?') >>> res [repeat('L'), repeat('U', 1), repeat('UR'), repeat('D', 1)] # infinite generator >>> [next(res[0]) for i in range(10)] ['L', 'L', 'L', 'L', 'L', 'L', 'L', 'L', 'L', 'L'] >>> list(res[1]) ['U'] >>> parse_expansion_moves('1LURD') Traceback (most recent call last): ValueError: Invalid rect-expansion(1LURD) due to: 'NoneType' object has no attribute 'groupdict'
-
pandalone.xleash._parse.
parse_xlref
(xlref)[source]¶ Like
_parse_xlref()
but tries also ifxlreaf
is encased by delimiter chars/\"$%&
.See also
_encase_regex
-
pandalone.xleash._parse.
parse_xlref_fragment
(xlref_fragment)[source]¶ Parses a xl-ref fragment, anything to the left of the hash(
#
).Parameters: xlref_fragment (str) – the url-fragment part of the xl-ref string, including the '#'
char.Returns: dictionary containing the following parameters: - sheet: (str, int, None) i.e.
sheet_name
- st_edge: (Edge, None) the 1st-ref, with raw cell
i.e.
Edge(land=Cell(row='8', col='UPT'), mov='LU', mod='-')
- nd_edge: (Edge, None) the 2nd-ref, with raw cell
i.e.
Edge(land=Cell(row='_', col='.'), mov='D', mod='+')
- exp_moves: (sequence, None), as i.e.
LDL1
parsed byparse_expansion_moves()
- js_filt: dict i.e.
{"dims: 1}
Return type: dict Examples:
>>> res = parse_xlref_fragment('Sheet1!A1(DR+):Z20(UL):L1U2R1D1:' ... '{"opts":{}, "func": "foo"}') >>> sorted(res.items()) [('call_spec', CallSpec(func='foo', args=[], kwds={})), ('exp_moves', 'L1U2R1D1'), ('nd_edge', Edge(land=Cell(row='20', col='Z'), mov='UL', mod=None)), ('opts', {}), ('sh_name', 'Sheet1'), ('st_edge', Edge(land=Cell(row='1', col='A'), mov='DR', mod='+'))]
Shortcut for all sheet from top-left to bottom-right full-cells:
>>> res = parse_xlref_fragment(':') >>> sorted(res.items()) [('call_spec', None), ('exp_moves', None), ('nd_edge', Edge(land=Cell(row='_', col='_'), mov=None, mod=None)), ('opts', None), ('sh_name', None), ('st_edge', Edge(land=Cell(row='^', col='^'), mov=None, mod=None))]
Errors:
>>> parse_xlref_fragment('A1(DR)Z20(UL)') Traceback (most recent call last): SyntaxError: Not an `xl-ref` syntax: A1(DR)Z20(UL)
- sheet: (str, int, None) i.e.
4.1.8. Submodule: pandalone.xleash.io
¶
Backends for opening sheets from various sources.
4.1.9. Submodule: pandalone.xleash.io.backend
¶
The manager and the base for all backends fetching cells from actual workbooks and sheets.
Prefer accessing the public members from the parent module.
-
class
pandalone.xleash.io.backend.
ABCBackend
[source]¶ Bases:
abc.ABC
A plugin for a backend must implement and add instances into
io_backends
.
-
class
pandalone.xleash.io.backend.
ABCSheet
[source]¶ Bases:
abc.ABC
A delegating to backend factory and sheet-wrapper with utility methods.
Parameters: - _states_matrix (np.ndarray) – The states-matrix cached, so recreate object to refresh it.
- _margin_coords (dict) – limits used by
_resolve_cell()
, cached, so recreate object to refresh it.
Resource management is outside of the scope of this class, and must happen in the backend workbook/sheet instance.
xlrd examples:
>>> import xlrd >>> with xlrd.open_workbook(self.tmp) as wb: ... sheet = xleash.xlrdSheet(wb.sheet_by_name('Sheet1')) ... ## Do whatever
win32 examples:
>>> with dsgdsdsfsd as wb: ... sheet = xleash.win32Sheet(wb.sheet['Sheet1']) TODO: Win32 Sheet example
-
_read_margin_coords
()[source]¶ Override if possible to read (any of the) limits directly from the sheet.
Returns: the 2 coords of the top-left & bottom-right full cells; anyone coords can be None. By default returns (None, None)
.Return type: (Coords, Coords) Raise: EmptyCaptureException if sheet empty
-
_read_states_matrix
()[source]¶ Read the states-matrix of the wrapped sheet.
Returns: A 2D-array with False
wherever cell are blank or empty.Return type: ndarray
-
get_margin_coords
()[source]¶ Extract (and cache) margins either internally or from
margin_coords_from_states_matrix()
.Returns: the resolved top-left and bottom-right xleash.Coords
Return type: tuple Raise: EmptyCaptureException if sheet empty
-
get_sheet_ids
()[source]¶ Returns: a 2-tuple of its wb-name and a sheet-ids of this sheet i.e. name & indx Return type: SheetId or None
-
get_states_matrix
()[source]¶ Read and cache the states-matrix of the wrapped sheet.
Returns: A 2D-array with False
wherever cell are blank or empty.Return type: ndarray Raise: EmptyCaptureException if sheet empty
-
read_rect
(st, nd)[source]¶ Fecth the actual values from the backend Excel-sheet.
Parameters: Returns: - Depends on whether both coords are given:
- If both given, 2D list-lists with the values of the rect, which might be empty if beyond limits.
- If only 1st given, the scalar value, and if beyond margins, raise error!
Return type: Raise: EmptyCaptureException (optionally) if sheet empty
-
class
pandalone.xleash.io.backend.
ArraySheet
(arr, ids=SheetId(book='wb', ids=['sh', 0]))[source]¶ Bases:
pandalone.xleash.io.backend.ABCSheet
A sample
ABCSheet
made out of 2D-list or numpy-arrays, for facilitating tests.-
__init__
(arr, ids=SheetId(book='wb', ids=['sh', 0]))[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
_read_states_matrix
()[source]¶ Read the states-matrix of the wrapped sheet.
Returns: A 2D-array with False
wherever cell are blank or empty.Return type: ndarray
-
get_sheet_ids
()[source]¶ Returns: a 2-tuple of its wb-name and a sheet-ids of this sheet i.e. name & indx Return type: SheetId or None
-
read_rect
(st, nd)[source]¶ Fecth the actual values from the backend Excel-sheet.
Parameters: Returns: - Depends on whether both coords are given:
- If both given, 2D list-lists with the values of the rect, which might be empty if beyond limits.
- If only 1st given, the scalar value, and if beyond margins, raise error!
Return type: Raise: EmptyCaptureException (optionally) if sheet empty
-
-
class
pandalone.xleash.io.backend.
SheetId
(book, ids)¶ Bases:
tuple
-
__getnewargs__
()¶ Return self as a plain tuple. Used by copy and pickle.
-
static
__new__
(_cls, book, ids)¶ Create new instance of SheetId(book, ids)
-
__repr__
()¶ Return a nicely formatted representation string
-
_asdict
()¶ Return a new OrderedDict which maps field names to their values.
-
classmethod
_make
(iterable)¶ Make a new SheetId object from a sequence or iterable
-
_replace
(**kwds)¶ Return a new SheetId object replacing specified fields with new values
-
book
¶ Alias for field number 0
-
ids
¶ Alias for field number 1
-
-
class
pandalone.xleash.io.backend.
SheetsFactory
(backends=None)[source]¶ Bases:
pandalone.xleash.io.backend.SimpleSheetsFactory
A caching-store of
ABCSheet
instances, serving them based on (workbook, sheet) IDs, optionally creating them from backends.Variables: _cached_sheets (dict) – A cache of all _Spreadsheets accessed so far, keyed by multiple keys generated by _derive_sheet_keys()
.- To avoid opening non-trivial workbooks, use the
add_sheet()
to pre-populate this cache with them. - It is a resource-manager for contained sheets, so it can be used wth
a
with
statement.
-
__init__
(backends=None)[source]¶ Parameters: backends – The list of backends
to consider when opening sheets. If it evaluates to false,io_backends
assumed.Typ backends: list or None
-
_derive_sheet_keys
(sheet, wb_ids=None, sh_ids=None)[source]¶ Retuns the product of user-specified and sheet-internal keys.
Parameters: - wb_ids – a single or a sequence of extra workbook-ids (ie: file, url)
- sh_ids – a single or sequence of extra sheet-ids (ie: name, index, None)
- To avoid opening non-trivial workbooks, use the
-
class
pandalone.xleash.io.backend.
SimpleSheetsFactory
(backends=None)[source]¶ Bases:
object
Asks backends to bid for creating
ABCSheet
instances - client should handle resources.Backends are taken from
io_backends
or specified during construction.
-
pandalone.xleash.io.backend.
margin_coords_from_states_matrix
(states_matrix)[source]¶ Returns top-left/bottom-down margins of full cells from a state matrix.
May be used by
ABCSheet.get_margin_coords()
if a backend does not report the sheet-margins internally.Parameters: states_matrix (np.ndarray) – A 2D-array with False
wherever cell are blank or empty. UseABCSheet.get_states_matrix()
to derrive it.Returns: the 2 coords of the top-left & bottom-right full cells Return type: (Coords, Coords) - Examples::
>>> states_matrix = np.asarray([ ... [0, 0, 0], ... [0, 1, 0], ... [0, 1, 1], ... [0, 0, 1], ... ]) >>> margins = margin_coords_from_states_matrix(states_matrix) >>> margins (Coords(row=1, col=1), Coords(row=3, col=2))
Note that the botom-left cell is not the same as
states_matrix
matrix size:>>> states_matrix = np.asarray([ ... [0, 0, 0, 0], ... [0, 1, 0, 0], ... [0, 1, 1, 0], ... [0, 0, 1, 0], ... [0, 0, 0, 0], ... ]) >>> margin_coords_from_states_matrix(states_matrix) == margins True
4.1.10. Submodule: pandalone.xleash.io._xlrd
¶
Implements the xlrd backend of xleash that reads in-file Excel-spreadsheets.
-
class
pandalone.xleash.io._xlrd.
XlrdBackend
[source]¶
-
class
pandalone.xleash.io._xlrd.
XlrdSheet
(sheet, book_fname, epoch1904=False)[source]¶ Bases:
pandalone.xleash.io.backend.ABCSheet
The xlrd workbook wrapper required by xleash library.
-
__init__
(sheet, book_fname, epoch1904=False)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
_read_margin_coords
()[source]¶ Override if possible to read (any of the) limits directly from the sheet.
Returns: the 2 coords of the top-left & bottom-right full cells; anyone coords can be None. By default returns (None, None)
.Return type: (Coords, Coords) Raise: EmptyCaptureException if sheet empty
-
get_sheet_ids
()[source]¶ Returns: a 2-tuple of its wb-name and a sheet-ids of this sheet i.e. name & indx Return type: SheetId or None
-
-
pandalone.xleash.io._xlrd.
_open_sheet_by_name_or_index
(xlrd_book, wb_id, sheet_id)[source]¶ Parameters: or str or None sheet_id (int) – If None
, opens 1st sheet.
-
pandalone.xleash.io._xlrd.
_parse_cell
(xcell, epoch1904=False)[source]¶ Parse a xl-xcell.
Parameters: - xcell (xlrd.sheet.Cell) – an excel xcell
- epoch1904 (bool) – Which date system was in force when this file was last saved. False => 1900 system (the Excel for Windows default). True => 1904 system (the Excel for Macintosh default).
Returns: formatted xcell value
Return type: int, float, datetime.datetime, bool, None, str, datetime.time, float(‘nan’)
Examples:
>>> import xlrd >>> from xlrd.sheet import Cell >>> _parse_cell(Cell(xlrd.XL_CELL_NUMBER, 1.2)) 1.2 >>> _parse_cell(Cell(xlrd.XL_CELL_DATE, 1.2)) datetime.datetime(1900, 1, 1, 4, 48) >>> _parse_cell(Cell(xlrd.XL_CELL_TEXT, 'hi')) 'hi'
4.1.11. Submodule: pandalone.xleash._capture
¶
The algorithmic part of capturing.
Prefer accessing the public members from the parent module.
-
pandalone.xleash._capture.
CHECK_CELLTYPE
= False¶ When
True
, most coord-functions accept any 2-tuples.
-
exception
pandalone.xleash._capture.
EmptyCaptureException
[source]¶ Bases:
Exception
Thrown when targeting fails.
-
pandalone.xleash._capture.
_col2num
(coord)[source]¶ Resolves special coords or converts Excel A1 columns to a zero-based, reporting invalids.
Parameters: coord (str) – excel-column coordinate or one of ^_.
Returns: excel column number, >= 0 Return type: int Examples:
>>> col = _col2num('D') >>> col 3 >>> _col2num('d') == col True >>> _col2num('AaZ') 727 >>> _col2num('10') 9 >>> _col2num(9) 8
Negatives (from left-end) are preserved:
>>> _col2num('AaZ') 727
Fails ugly:
>>> _col2num('%$') Traceback (most recent call last): ValueError: substring not found >>> _col2num([]) Traceback (most recent call last): TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'
-
pandalone.xleash._capture.
_expand_rect
(states_matrix, r1, r2, exp_moves)[source]¶ Applies the expansion-moves based on the
states_matrix
.Parameters: Returns: a sorted rect top-left/bottom-right
Examples:
>>> states_matrix = np.array([ ... #0 1 2 3 4 5 ... [0, 0, 0, 0, 0, 0], #0 ... [0, 0, 1, 1, 1, 0], #1 ... [0, 1, 0, 0, 1, 0], #2 ... [0, 1, 1, 1, 1, 0], #3 ... [0, 0, 0, 0, 0, 1], #4 ... ], dtype=bool) >>> r1, r2 = (Coords(2, 1), Coords(2, 1)) >>> _expand_rect(states_matrix, r1, r2, 'U') (Coords(row=2, col=1), Coords(row=2, col=1)) >>> r1, r2 = (Coords(3, 1), Coords(2, 1)) >>> _expand_rect(states_matrix, r1, r2, 'R') (Coords(row=2, col=1), Coords(row=3, col=4)) >>> r1, r2 = (Coords(2, 1), Coords(6, 1)) >>> _expand_rect(states_matrix, r1, r2, 'r') (Coords(row=2, col=1), Coords(row=6, col=5)) >>> r1, r2 = (Coords(2, 3), Coords(2, 3)) >>> _expand_rect(states_matrix, r1, r2, 'LURD') (Coords(row=1, col=1), Coords(row=3, col=4))
-
pandalone.xleash._capture.
_extract_states_vector
(states_matrix, dn_coords, land, mov)[source]¶ Extract a slice from the states-matrix by starting from
land
and followingmov
.
-
pandalone.xleash._capture.
_resolve_cell
(cell, up_coords, dn_coords, base_coords=None)[source]¶ Translates any special coords to absolute ones.
To get the margin_coords, use one of:
ABCSheet.get_margin_coords()
io.backend.margin_coords_from_states_matrix()
Parameters: Returns: the resolved cell-coords
Return type: Examples:
>>> up = Coords(1, 2) >>> dn = Coords(10, 6) >>> base = Coords(40, 50) >>> _resolve_cell(Cell(col='B', row='5'), up, dn) Coords(row=4, col=1) >>> _resolve_cell(Cell('^', '^'), up, dn) Coords(row=1, col=2) >>> _resolve_cell(Cell('_', '_'), up, dn) Coords(row=10, col=6) >>> base == _resolve_cell(Cell('.', '.'), up, dn, base) True >>> _resolve_cell(Cell('-1', '-2'), up, dn) Coords(row=10, col=5) >>> _resolve_cell(Cell('A', 'B'), up, dn) Traceback (most recent call last): ValueError: invalid cell(Cell(row='A', col='B')) due to: invalid row('A') due to: invalid literal for int() with base 10: 'A'
But notice when base-cell missing:
>>> _resolve_cell(Cell('1', '.'), up, dn) Traceback (most recent call last): ValueError: invalid cell(Cell(row='1', col='.')) due to: Cannot resolve `relative-col` without `base-coord`!
-
pandalone.xleash._capture.
_resolve_coord
(cname, cfunc, coord, up_coord, dn_coord, base_coords=None)[source]¶ Translates special coords or converts Excel string 1-based rows/cols to zero-based, reporting invalids.
Parameters: - cname (str) – the coord-name, one of ‘row’, ‘column’
- cfunc (function) – the function to convert coord
str --> int
- str coord (int,) – the “A1” coord to translate
- up_coord (int) – the resolved top or left margin zero-based coordinate
- dn_coord (int) – the resolved bottom or right margin zero-based coordinate
- None base_coords (int,) – the resolved basis for dependent coord, if any
Returns: the resolved coord or
None
if it were not a special coord.Row examples:
>>> cname = 'row' >>> r0 = _resolve_coord(cname, _row2num, '1', 1, 10) >>> r0 0 >>> r0 == _resolve_coord(cname, _row2num, 1, 1, 10) True >>> _resolve_coord(cname, _row2num, '^', 1, 10) 1 >>> _resolve_coord(cname, _row2num, '_', 1, 10) 10 >>> _resolve_coord(cname, _row2num, '.', 1, 10, 13) 13 >>> _resolve_coord(cname, _row2num, '-3', 0, 10) 8
But notice when base-cell missing:
>>> _resolve_coord(cname, _row2num, '.', 0, 10, base_coords=None) Traceback (most recent call last): ValueError: Cannot resolve `relative-row` without `base-coord`!
Other ROW error-checks:
>>> _resolve_coord(cname, _row2num, '0', 0, 10) Traceback (most recent call last): ValueError: invalid row('0') due to: Uncooked-coord cannot be zero! >>> _resolve_coord(cname, _row2num, 'a', 0, 10) Traceback (most recent call last): ValueError: invalid row('a') due to: invalid literal for int() with base 10: 'a' >>> _resolve_coord(cname, _row2num, None, 0, 10) Traceback (most recent call last): ValueError: invalid row(None) due to: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
Column examples:
>>> cname = 'column' >>> _resolve_coord(cname, _col2num, 'A', 1, 10) 0 >>> _resolve_coord(cname, _col2num, 'DADA', 1, 10) 71084 >>> _resolve_coord(cname, _col2num, '.', 1, 10, 13) 13 >>> _resolve_coord(cname, _col2num, '-4', 0, 10) 7
And COLUMN error-checks:
>>> _resolve_coord(cname, _col2num, None, 0, 10) Traceback (most recent call last): ValueError: invalid column(None) due to: int() argument must be a string, a bytes-like object or a number, not 'NoneType' >>> _resolve_coord(cname, _col2num, 0, 0, 10) Traceback (most recent call last): ValueError: invalid column(0) due to: Uncooked-coord cannot be zero!
-
pandalone.xleash._capture.
_row2num
(coord)[source]¶ Resolves special coords or converts Excel 1-based rows to zero-based, reporting invalids.
Parameters: int coord (str,) – excel-row coordinate or one of ^_.
Returns: excel row number, >= 0 Return type: int Examples:
>>> row = _row2num('1') >>> row 0 >>> row == _row2num(1) True
Negatives (from bottom) are preserved:
>>> _row2num('-1') -1
Fails ugly:
>>> _row2num('.') Traceback (most recent call last): ValueError: invalid literal for int() with base 10: '.'
-
pandalone.xleash._capture.
_sort_rect
(r1, r2)[source]¶ Sorts rect-vertices in a 2D-array (with vertices in rows).
Example:
>>> _sort_rect((5, 3), (4, 6)) array([[4, 3], [5, 6]])
-
pandalone.xleash._capture.
_target_opposite
(states_matrix, dn_coords, land, moves, edge_name='')[source]¶ Follow moves from
land
and stop on the 1st full-cell.Parameters: Returns: the identified target-cell’s coordinates
Return type: Examples:
>>> states_matrix = np.array([ ... [0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 0, 0], ... [0, 0, 0, 1, 1, 1], ... [0, 0, 1, 0, 0, 1], ... [0, 0, 1, 1, 1, 1] ... ]) >>> args = (states_matrix, Coords(4, 5)) >>> _target_opposite(*(args + (Coords(0, 0), 'DR'))) Coords(row=3, col=2) >>> _target_opposite(*(args + (Coords(0, 0), 'RD'))) Coords(row=2, col=3)
It fails if a non-empty target-cell cannot be found, or it ends-up beyond bounds:
>>> _target_opposite(*(args + (Coords(0, 0), 'D'))) Traceback (most recent call last): pandalone.xleash._capture.EmptyCaptureException: No opposite-target found while moving(D) from landing-Coords(row=0, col=0)! >>> _target_opposite(*(args + (Coords(0, 0), 'UR'))) Traceback (most recent call last): pandalone.xleash._capture.EmptyCaptureException: No opposite-target found while moving(UR) from landing-Coords(row=0, col=0)!
But notice that the landing-cell maybe outside of bounds:
>>> _target_opposite(*(args + (Coords(3, 10), 'L'))) Coords(row=3, col=5)
-
pandalone.xleash._capture.
_target_same
(states_matrix, dn_coords, land, moves, edge_name='')[source]¶ Scan term:
exterior
row and column on specifiedmoves
and stop on the last full-cell.Parameters: Returns: the identified target-cell’s coordinates
Return type: Examples:
>>> states_matrix = np.array([ ... [0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 0, 0], ... [0, 0, 0, 1, 1, 1], ... [0, 0, 1, 0, 0, 1], ... [0, 0, 1, 1, 1, 1] ... ]) >>> args = (states_matrix, Coords(4, 5)) >>> _target_same(*(args + (Coords(4, 5), 'U'))) Coords(row=2, col=5) >>> _target_same(*(args + (Coords(4, 5), 'L'))) Coords(row=4, col=2) >>> _target_same(*(args + (Coords(4, 5), 'UL', ))) Coords(row=2, col=2)
It fails if landing is empty or beyond bounds:
>>> _target_same(*(args + (Coords(2, 2), 'DR'))) Traceback (most recent call last): pandalone.xleash._capture.EmptyCaptureException: No same-target found while moving(DR) from landing-Coords(row=2, col=2)! >>> _target_same(*(args + (Coords(10, 3), 'U'))) Traceback (most recent call last): pandalone.xleash._capture.EmptyCaptureException: No same-target found while moving(U) from landing-Coords(row=10, col=3)!
-
pandalone.xleash._capture.
_target_same_vector
(states_matrix, dn_coords, land, mov)[source]¶ Parameters:
-
pandalone.xleash._capture.
coords2Cell
(row, col)[source]¶ Make A1
Cell
from resolved coords, with rudimentary error-checking.Examples:
>>> coords2Cell(row=0, col=0) Cell(row='1', col='A') >>> coords2Cell(row=0, col=26) Cell(row='1', col='AA') >>> coords2Cell(row=10, col='.') Cell(row='11', col='.') >>> coords2Cell(row=-3, col=-2) Traceback (most recent call last): AssertionError: negative row!
-
pandalone.xleash._capture.
resolve_capture_rect
(states_matrix, up_dn_margins, st_edge, nd_edge=None, exp_moves=None, base_coords=None)[source]¶ Performs targeting, capturing and expansions based on the states-matrix.
To get the margin_coords, use one of:
ABCSheet.get_margin_coords()
io.backend.margin_coords_from_states_matrix()
Its results can be fed into
read_capture_values()
.Parameters: - states_matrix (np.ndarray) – A 2D-array with
False
wherever cell are blank or empty. UseABCSheet.get_states_matrix()
to derrive it. - Coords) up_dn_margins ((Coords,) – the top-left/bottom-right coords with full-cells
- st_edge (Edge) – “uncooked” as matched by regex
- nd_edge (Edge) – “uncooked” as matched by regex
- or none exp_moves (list) – Just the parsed string, and not
None
. - base_coords (Coords) – The base for a dependent 1st edge.
Returns: a
(Coords, Coords)
with the 1st and 2nd capture-cell ordered from top-left –> bottom-right.Return type: Raises: EmptyCaptureException – When targeting failed, and no target cell identified.
- Examples::
>>> from pandalone.xleash import Edge, margin_coords_from_states_matrix
>>> states_matrix = np.array([ ... [0, 0, 0, 0, 0, 0], ... [0, 0, 0, 0, 0, 0], ... [0, 0, 0, 1, 1, 1], ... [0, 0, 1, 0, 0, 1], ... [0, 0, 1, 1, 1, 1] ... ], dtype=bool) >>> up, dn = margin_coords_from_states_matrix(states_matrix)
>>> st_edge = Edge(Cell('1', 'A'), 'DR') >>> nd_edge = Edge(Cell('.', '.'), 'DR') >>> resolve_capture_rect(states_matrix, (up, dn), st_edge, nd_edge) (Coords(row=3, col=2), Coords(row=4, col=2))
Using dependenent coordinates for the 2nd edge:
>>> st_edge = Edge(Cell('_', '_'), None) >>> nd_edge = Edge(Cell('.', '.'), 'UL') >>> rect = resolve_capture_rect(states_matrix, (up, dn), st_edge, nd_edge) >>> rect (Coords(row=2, col=2), Coords(row=4, col=5))
Using sheet’s margins:
>>> st_edge = Edge(Cell('^', '_'), None) >>> nd_edge = Edge(Cell('_', '^'), None) >>> rect == resolve_capture_rect(states_matrix, (up, dn), st_edge, nd_edge) True
Walking backwards:
>>> st_edge = Edge(Cell('^', '_'), 'L') # Landing is full, so 'L' ignored. >>> nd_edge = Edge(Cell('_', '_'), 'L', '+') # '+' or would also stop. >>> rect == resolve_capture_rect(states_matrix, (up, dn), st_edge, nd_edge) True
4.1.12. Submodule: pandalone.xleash._filter
¶
The high-level functionality, the filtering and recursive lassoing.
Prefer accessing the public members from the parent module.
-
class
pandalone.xleash._filter.
XLocation
(sheet, st, nd, base_coords)¶ Bases:
tuple
Fields denoting the position of a sheet/cell while running a element-wise-filter.
Practically func:
run_filter_elementwise() preserves these fields if the processed ones were `None
.-
__getnewargs__
()¶ Return self as a plain tuple. Used by copy and pickle.
-
static
__new__
(_cls, sheet, st, nd, base_coords)¶ Create new instance of XLocation(sheet, st, nd, base_coords)
-
__repr__
()¶ Return a nicely formatted representation string
-
_asdict
()¶ Return a new OrderedDict which maps field names to their values.
-
classmethod
_make
(iterable)¶ Make a new XLocation object from a sequence or iterable
-
_replace
(**kwds)¶ Return a new XLocation object replacing specified fields with new values
-
base_coords
¶ Alias for field number 3
-
nd
¶ Alias for field number 2
-
sheet
¶ Alias for field number 0
-
st
¶ Alias for field number 1
-
-
pandalone.xleash._filter.
_classify_rect_shape
(st, nd)[source]¶ Identifies rect from its edge-coordinates (row, col, 2d-table)..
Parameters: Returns: in int based on the input like that:
- 0: only
st
given - 1:
st
andnd
point the same cell - 2: row
- 3: col
- 4: 2d-table
Examples:
>>> _classify_rect_shape((1,1), None) 0 >>> _classify_rect_shape((2,2), (2,2)) 1 >>> _classify_rect_shape((2,2), (2,20)) 2 >>> _classify_rect_shape((2,2), (20,2)) 3 >>> _classify_rect_shape((2,2), (20,20)) 4
- 0: only
-
pandalone.xleash._filter.
_downdim
(values, new_ndim)[source]¶ Squeeze it, and then flatten it, before inflating it.
Parameters: - values – The scalar ot 2D-results of
Sheet.read_rect()
- new_dim (int) – The new dimension the result should have
- values – The scalar ot 2D-results of
-
pandalone.xleash._filter.
_redim
(values, new_ndim)[source]¶ Reshapes the capture-rect values of
read_capture_rect()
.Parameters: - values ((nested) list, *) – The scalar ot 2D-results of
Sheet.read_rect()
- new_ndim –
Returns: reshaped values
Return type: list of lists, list, *
Examples:
>>> _redim([1, 2], 2) [[1, 2]] >>> _redim([[1, 2]], 1) [1, 2] >>> _redim([], 2) [[]] >>> _redim([[3.14]], 0) 3.14 >>> _redim([[11, 22]], 0) [11, 22] >>> arr = [[[11], [22]]] >>> arr == _redim(arr, None) True >>> _redim([[11, 22]], 0) [11, 22]
- values ((nested) list, *) – The scalar ot 2D-results of
-
pandalone.xleash._filter.
_updim
(values, new_ndim)[source]¶ Append trivial dimensions to the left.
Parameters: - values – The scalar ot 2D-results of
Sheet.read_rect()
- new_dim (int) – The new dimension the result should have
- values – The scalar ot 2D-results of
-
pandalone.xleash._filter.
install_default_filters
(filters_dict)[source]¶ Updates the default available filters used by
lasso()
when constructing its internalRanger
.param dict filters_dict: The dictionary to update with the default filters.
-
pandalone.xleash._filter.
pipe_filter
(ranger, lasso, *filters, **kwds)[source]¶ A bulk-filter that applies all call-specifiers one after another on the capture-rect values.
Parameters: filters (list) – the json-parsed call-spec
-
pandalone.xleash._filter.
py_filter
(ranger, lasso, expr)[source]¶ A bulk-filter that passes values through a python-expression using
asteval
library.The
expr
may access read-write alllocals()
of this method (ranger
,lasso
), thenumpy
funcs, and thepandalone.xleash
module under thexleash
variable.- The
expr
may return either: - the processed values, or
- an instance of the
Lasso
, in which case only itsopt
field is checked and replaced with original if missing. So better usenamedtuple._replace()
on the currentlasso
which exists in the expr’s namespace.
Parameters: expr (str) – The python-expression, which may comprise of multiple statements. - The
-
pandalone.xleash._filter.
pyeval_filter
(ranger, lasso, filters=(), eval_all=False, include=None, exclude=None, depth=-1)[source]¶ A element-wise-filter that uses
asteval
to evaluate string values as python expressions.The
expr
fecthed fromterm:`capturing
may access read-write alllocals()
of this method (ie:ranger
,lasso
), thenumpy
funcs, and thepandalone.xleash
module under thexleash
variable.- The
expr
may return either: - the processed values, or
- an instance of the
Lasso
, in which case only itsopt
field is checked and replaced with original if missing. So better usenamedtuple._replace()
on the currentlasso
which exists in the expr’s namespace.
Parameters: - eval_all (bool) – If
True
raise on 1st error and stop diving cells. Defaults toFalse
. - filters (list) – Any filters to apply after invoking the
element_func
. - or str include (list) – Items to include when diving into “indexed” values.
See
run_filter_elementwise()
. - or str exclude (list) – Items to exclude when diving into “indexed” values.
See
run_filter_elementwise()
. - or None depth (int) – How deep to dive into nested structures, “indexed” or lists.
If
< 0
, no limit. If 0, stops completely. Seerun_filter_elementwise()
.
Example:
>>> from pandalone import xleash >>> expr = ''' ... res = array([[0.5, 0.3, 0.1, 0.1]]) ... res * res.T ... ''' >>> lasso = Lasso(values=expr, opts={}) >>> with xleash.SheetsFactory() as sf: ... ranger = xleash.Ranger(sf) ... pyeval_filter(ranger, lasso).values array([[0.25, 0.15, 0.05, 0.05], [0.15, 0.09, 0.03, 0.03], [0.05, 0.03, 0.01, 0.01], [0.05, 0.03, 0.01, 0.01]])
- The
-
pandalone.xleash._filter.
recursive_filter
(ranger, lasso, filters=(), include=None, exclude=None, depth=-1)[source]¶ A element-wise-filter that expand recursively any xl-ref strings elements in capture-rect values.
Parameters: - filters (list) – Any filters to apply after invoking the
element_func
. - or str include (list) – Items to include when diving into “indexed” values.
See
run_filter_elementwise()
. - or str exclude (list) – Items to exclude when diving into “indexed” values.
See
run_filter_elementwise()
. - or None depth (int) – How deep to dive into nested structures, “indexed” or lists.
If
< 0
, no limit. If 0, stops completely. Seerun_filter_elementwise()
.
- filters (list) – Any filters to apply after invoking the
-
pandalone.xleash._filter.
redim_filter
(ranger, lasso, scalar=None, cell=None, row=None, col=None, table=None)[source]¶ A bulk-filter that reshapes sand/or transpose captured values, depending on rect’s shape.
Each dimension might be a single int or None, or a pair [dim, transpose].
-
pandalone.xleash._filter.
run_filter_elementwise
(ranger, lasso, element_func, filters, include=None, exclude=None, depth=-1, *args, **kwds)[source]¶ Runner of all element-wise filters.
It applies the
element_func
on elements extracted fromlasso.values
by treating the later first as “indexed” objects (Mappings, Series and Dataframes.), and if that fails, as nested lists.The
include
/exclude
filter args work only for “indexed” objects withitems()
and indexing methods.- If no filter arg specified, expands for all keys.
- If only
include
specified, rejects all keys not explicitly contained in this filter arg. - If only
exclude
specified, expands all keys not explicitly contained in this filter arg. - When both
include
/exclude
exist, only those explicitly included are accepted, unless also excluded.
Lower the
logging
level to see other than syntax-errors on recursion reported onlog
.Only those in
XLocation
are passed recursively.
Parameters: - element_func (list) –
A function implementing the element-wise filter and returning a 2-tuple
(is_proccessed, new_val_or_lasso)
, like that:def element_func(ranger, lasso, context, elval) proced = False try: elval = int(elval) proced = True except ValueError: pass return proced, elval
Its
kwds
may contain theinclude
,exclude
anddepth
args. Any exception raised fromelement_func
will cancel the diving. - filters (list) – Any filters to apply after invoking the
element_func
. - or str include (list) – Items to include when diving into “indexed” values. See description above.
- or str exclude (list) – Items to exclude when diving into “indexed” values. See description above.
- or None depth (int) – How deep to dive into nested structures, “indexed” or lists.
If
< 0
, no limit. If 0, stops completely.
Params args: To be relayed to ‘element_func’.
Params kwds: To be relayed to ‘element_func’.
4.1.13. Submodule: pandalone.xleash._lasso
¶
The high-level functionality, the filtering and recursive lassoing.
Prefer accessing the public members from the parent module.
-
class
pandalone.xleash._lasso.
Ranger
(sheets_factory, base_opts=None, available_filters=None)[source]¶ Bases:
object
The director-class that performs all stages required for “throwing the lasso” around rect-values.
Use it when you need to have total control of the procedure and configuration parameters, since no defaults are assumed.
The
do_lasso()
does the job.Variables: - sheets_factory (SheetsFactory) – Factory of sheets from where to parse rect-values; does not
close it in the end.
Maybe
None
, butdo_lasso()
will scream unless invoked with acontext_lasso
arg containing a concreteABCSheet
. - base_opts (dict) – The opts that are deep-copied and used as the defaults
for every
do_lasso()
, whether invoked directly or recursively byrecursive_filter()
. If unspecified, no opts are used, but this attr is set to an empty dict. Seeget_default_opts()
. - or None available_filters (dict) – The filters available for a xl-ref to use.
If
None
, then usesxleash.installed_filters
. Use an empty dict not to use any filters. - intermediate_lasso (Lasso) – A
('stage', Lasso)
pair with the lastLasso
instance produced during the last execution of thedo_lasso()
. Used for inspecting/debuging.
-
__init__
(sheets_factory, base_opts=None, available_filters=None)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
_make_init_Lasso
(**context_kwds)[source]¶ Creates the lasso to be used for each new
do_lasso()
invocation.
-
_parse_and_merge_with_context
(xlref, init_lasso)[source]¶ Merges xl-ref parsed-parsed_fields with
init_lasso
, reporting any errors.Parameters: init_lasso (Lasso) – Default values to be overridden by non-nulls. Returns: a Lasso with any non None
parsed-fields updated
-
_resolve_capture_rect
(lasso, sheet)[source]¶ Also handles
EmptyCaptureException
in caseopts['no_empty'] != False
.
- sheets_factory (SheetsFactory) – Factory of sheets from where to parse rect-values; does not
close it in the end.
Maybe
-
pandalone.xleash._lasso.
get_default_opts
(overrides=None)[source]¶ Default opts used by
lasso()
when constructing its internalRanger
.Parameters: or None overrides (dict) – Any items to update the default ones.
-
pandalone.xleash._lasso.
lasso
(xlref, sheets_factory=None, base_opts=None, available_filters=None, return_lasso=False, **context_kwds)[source]¶ High-level function to lasso around spreadsheet’s rect-regions according to xl-ref strings by using internally a
Ranger
.Parameters: - xlref (str) –
a string with the xl-ref format:
<url_file>#<sheet>!<1st_edge>:<2nd_edge>:<expand><js_filt>
i.e.:
file:///path/to/file.xls#sheet_name!UPT8(LU-):_.(D+):LDL1{"dims":1}
- sheets_factory – Factory of sheets from where to parse rect-values; if unspecified,
the new
SheetsFactory
created is closed afterwards. Delegated tomake_default_Ranger()
, so items override default ones; use a newRanger
if that is not desired. - available_filters (dict or None) – Delegated to
make_default_Ranger()
, so items override default ones; use a newRanger
if that is not desired. - return_lasso (bool) –
If
True
, values are contained in the returned Lasso instance, along with all other artifacts of the lassoing procedure.For more debugging help, create a
Range
yourself and inspect theRanger.intermediate_lasso
. - context_kwds (Lasso) – Default
Lasso
fields in case parsed ones areNone
(i.e. you can specify the sheet like that).
Variables: base_opts – Opts affecting the lassoing procedure that are deep-copied and used as the base-opts for every
Ranger.do_lasso()
, whether invoked directly or recursively byrecursive_filter()
. Read the code to be sure what are the available choices. Delegated tomake_default_Ranger()
, so items override default ones; use a newRanger
if that is not desired.Returns: Either the captured & filtered values or the final
Lasso
, depending on thereturn_lassos
arg.Example:
sheet = _
- xlref (str) –
-
pandalone.xleash._lasso.
make_default_Ranger
(sheets_factory=None, base_opts=None, available_filters=None)[source]¶ Makes a defaulted
Ranger
.Parameters: - sheets_factory – Factory of sheets from where to parse rect-values; if unspecified,
a new
SheetsFactory
is created. Remember to invoke itsSheetsFactory.close()
to clear resources from any opened sheets. - base_opts (dict or None) –
Default opts to affect the lassoing, to be merged with defaults; uses
get_default_opts()
.Read the code to be sure what are the available choices :-(.
- available_filters (dict or None) – The filters available for a xl-ref to use.
(
xleash.installed_filters
used if unspecified).
For instance, to make you own sheets-factory and override options, yoummay do this:
>>> from pandalone import xleash >>> with xleash.SheetsFactory() as sf: ... xleash.make_default_Ranger(sf, base_opts={'lax': True}) <pandalone.xleash._lasso.Ranger object at ...
- sheets_factory – Factory of sheets from where to parse rect-values; if unspecified,
a new
4.2. Module: pandalone.mappings
¶
Hierarchical string-like objects useful for indexing, that can be rename/relocated at a later stage.
Pstep |
Automagically-constructed relocatable paths for accessing data-tree. |
pmods_from_tuples (pmods_tuples) |
Turns a list of 2-tuples into a pmods hierarchy. |
Pmod ([_alias, _steps, _regxs]) |
A path-step mapping forming the pmods-hierarchy. |
Example:
>>> from pandalone.mappings import pmods_from_tuples
>>> pmods = pmods_from_tuples([
... ('', 'deeper/ROOT'),
... ('/abc', 'ABC'),
... ('/abc/foo', 'BAR'),
... ])
>>> p = pmods.step()
>>> p.abc.foo
`BAR`
>>> p._paths()
['deeper/ROOT/ABC/BAR']
- TODO: Implements “anywhere” pmods(
//
).
-
class
pandalone.mappings.
Pmod
(_alias=None, _steps={}, _regxs={})[source]¶ Bases:
object
A path-step mapping forming the pmods-hierarchy.
The pmods denotes the hierarchy of all mappings, that either rename or relocate path-steps.
A single mapping transforms an “origin” path to a “destination” one (also called as “from” and “to” paths).
A mapping always transforms the final path-step, like that:
FROM_PATH TO_PATH RESULT_PATH --------- ------- ----------- /rename/path foo --> /rename/foo ## renaming /relocate/path foo/bar --> /relocate/foo/bar ## relocation '' a/b/c --> /a/b/c ## Relocate all paths. / a/b/c --> /a/b/c ## Relocates 1st "empty-str" step.
The pmod is the mapping of that single path-step.
It is possible to match fully on path-steps using regular-expressions, and then to use any captured-groups from the final step into the mapped value:
(/all(.*)/path, foo) + all_1/path --> /all_1/foo + all_XYZ --> /all_XYZ ## no change (/all(.*)/path, foo\1) + all_1/path --> /all_1/foo_1
If more than one regex match, they are merged in the order declared (the latest one overrides a previous one).
Any exact child-name matches are applied and merged after regexs.
Use
pmods_from_tuples()
to construct the pmods-hierarchy.The pmods are used internally by class:
Pstep
to correspond the component-paths of their input & output onto the actual value-tree paths.
Variables: Example:
Note
Do not manually construct instances from this class! To construct a hierarchy use the
pmods_from_tuples()
or pass mappings as the 2nd argument inPstep
constructor.You can either use it for massively map paths, either for renaming them:
>>> pmods = pmods_from_tuples([ ... ('/a', 'A'), ... ('/~b.*', r'BB\g<0>'), ## Previous match. ... ('/~b.*/~c.(.*)', r'W\1ER'), ## Capturing-group(1) ... ]) >>> pmods.map_paths(['/a', '/a/foo']) ## 1st rule ['/A', '/A/foo'] >>> pmods.map_path('/big/stuff') ## 2nd rule '/BBbig/stuff' >>> pmods.map_path('/born/child') ## 2nd & 3rd rule '/BBborn/WildER'
or to relocate them:
>>> pmods = pmods_from_tuples([ ... ('/a', 'A/AA'), ... ('/~b.*/~c(.*)', r'../C/\1'), ... ('/~b.*/~.*/~r.*', r'/\g<0>'), ... ]) >>> pmods.map_paths(['/a/foo', '/big/child', '/begin/from/root']) ['/A/AA/foo', '/big/C/hild', '/root']
Here is how you relocate “root” (notice that the
''
path is the root):>>> pmods = pmods_from_tuples([('', '/NEW/ROOT')]) >>> pmods.map_paths(['/a/foo', '']) ['/NEW/ROOT/a/foo', '/NEW/ROOT']
-
__init__
(_alias=None, _steps={}, _regxs={})[source]¶ Args passed only for testing, remember
_regxs
to be (k,v) tuple-list!Note
Volatile arg-defaults (empty dicts) are knowingly used , to preserve memory; should never append in them!
-
_append_into_regxs
(key)[source]¶ Inserts a child-mappings into
_steps
dict.Parameters: key (str) – the regex-pattern to add
-
_append_into_steps
(key)[source]¶ Inserts a child-mappings into
_steps
dict.Parameters: key (str) – the step-name to add
-
_merge
(other)[source]¶ Clone and override all its props with props from other-pmod, recursively.
Although it does not modify this, the
other
or their children pmods, it may “share” (crosslink) them, so pmods MUST NOT be modified later.Parameters: other (Pmod) – contains the dicts with the overrides Returns: the cloned merged pmod Return type: Pmod Examples:
Look how
_steps
are merged:>>> pm1 = Pmod(_alias='pm1', _steps={ ... 'a':Pmod(_alias='A'), 'c':Pmod(_alias='C')}) >>> pm2 = Pmod(_alias='pm2', _steps={ ... 'b':Pmod(_alias='B'), 'a':Pmod(_alias='AA')}) >>> pm = pm1._merge(pm2) >>> sorted(pm._steps.keys()) ['a', 'b', 'c']
And here it is
_regxs
merging, which preserves order:>>> pm1 = Pmod(_alias='pm1', ... _regxs=[('d', Pmod(_alias='D')), ... ('a', Pmod(_alias='A')), ... ('c', Pmod(_alias='C'))]) >>> pm2 = Pmod(_alias='pm2', ... _regxs=[('b', Pmod(_alias='BB')), ... ('a', Pmod(_alias='AA'))]) >>> pm1._merge(pm2) pmod('pm2', OrderedDict([(re.compile('d'), pmod('D')), (re.compile('c'), pmod('C')), (re.compile('b'), pmod('BB')), (re.compile('a'), pmod('AA'))])) >>> pm2._merge(pm1) pmod('pm1', OrderedDict([(re.compile('b'), pmod('BB')), (re.compile('d'), pmod('D')), (re.compile('a'), pmod('A')), (re.compile('c'), pmod('C'))]))
-
_override_regxs
(other)[source]¶ Override this pmod’s
_regxs
dict with other’s, recursively.- It may “share” (crosslink) the dict and/or its child-pmods
between the two pmod args (
self
andother
). - No dict is modified (apart from self, which must have been cloned
previously by
Pmod._merge()
), to avoid side-effects in case they were “shared”. - It preserves dict-ordering so that
other
order takes precedence (its elements are the last ones).
Parameters: - It may “share” (crosslink) the dict and/or its child-pmods
between the two pmod args (
-
_override_steps
(other)[source]¶ Override this pmod’s ‘_steps’ dict with other’s, recursively.
Same as
_override_regxs()
but without caring for order.
-
alias
(cstep)[source]¶ Like
descend()
but without merging child-pmods.Returns: the expanded alias from child/regexs or None
-
descend
(cstep)[source]¶ Return child-pmod with merged any exact child with all matched regexps, along with its alias regex-expaned.
Parameters: cstep (str) – the child path-step cstep of the pmod to return Returns: the merged-child pmod, along with the alias; both might be None, if nothing matched, or no alias. Return type: tuple(Pmod, str) Example:
>>> pm = Pmod( ... _steps={'a': Pmod(_alias='A')}, ... _regxs=[(r'a\w*', Pmod(_alias='AWord')), ... (r'a(\d*)', Pmod(_alias=r'A_\1')), ... ]) >>> pm.descend('a') (pmod('A'), 'A') >>> pm.descend('abc') (pmod('AWord'), 'AWord') >>> pm.descend('a12') (pmod('A_\\1'), 'A_12') >>> pm.descend('BAD') (None, None)
Notice how children of regexps are merged together:
>>> pm = Pmod( ... _steps={'a': ... Pmod(_alias='A', _steps={1: 11})}, ... _regxs=[ ... (r'a\w*', Pmod(_alias='AWord', ... _steps={2: Pmod(_alias=22)})), ... (r'a\d*', Pmod(_alias='ADigit', ... _steps={3: Pmod(_alias=33)})), ... ]) >>> sorted(pm.descend('a')[0]._steps) ## All children and regexps match. [1, 2, 3] >>> pm.descend('aa')[0]._steps ## Only r'a\w*' matches. {2: pmod(22)} >>> sorted(pm.descend('a1')[0]._steps ) ## Both regexps matches. [2, 3]
So it is possible to say:
>>> pm.descend('a1')[0].alias(2) 22 >>> pm.descend('a1')[0].alias(3) 33 >>> pm.descend('a1')[0].descend('BAD') (None, None) >>> pm.descend('a$') (None, None)
but it is better to use
map_path()
for this.
-
map_path
(path)[source]¶ Maps a ‘/rooted/path’ using all aliases while descending its child pmods.
It uses any aliases on all child pmods if found.
Parameters: path (str) – a rooted path to transform Returns: the rooted mapped path or ‘/’ if path was ‘/’ Return type: str or None Examples:
>>> pmods = pmods_from_tuples([ ... ('/a', 'A/AA'), ... ('/~a(\\w*)', r'BB\1'), ... ('/~a\\w*/~d.*', r'D \g<0>'), ... ('/~a(\\d+)', r'C/\1'), ... ('/~a(\\d+)/~(c.*)', r'CC-/\1'), # The 1st group is ignored! ... ('/~a\\d+/~e.*', r'/newroot/\g<0>'), # Rooted mapping. ... ]) >>> pmods.map_path('/a') '/A/AA' >>> pmods.map_path('/a_hi') '/BB_hi' >>> pmods.map_path('/a12') '/C/12' >>> pmods.map_path('/a12/etc') '/newroot/etc'
Notice how children from all matching prior-steps are merged:
>>> pmods.map_path('/a12/dow') '/C/12/D dow' >>> pmods.map_path('/a12/cow') '/C/12/CC-/cow'
To map root use ‘’ which matches before the 1st slash(‘/’):
>>> pmods = pmods_from_tuples([('', 'New/Root'),]) ## Relative >>> pmods pmod({'': pmod('New/Root')}) >>> pmods.map_path('/for/plant') 'New/Root/for/plant' >>> pmods_from_tuples([('', '/New/Root'),]).map_path('/for/plant') '/New/Root/for/plant'
Note
Using slash(‘/’) for “from” path will NOT map root:
>>> pmods = pmods_from_tuples([('/', 'New/Root'),]) >>> pmods pmod({'': pmod({'': pmod('New/Root')})}) >>> pmods.map_path('/for/plant') '/for/plant' >>> pmods.map_path('//for/plant') '/New/Root/for/plant' '/root'
but ‘’ always remains unchanged (whole document):
>>> pmods.map_path('') ''
-
step
(pname='', alias=None)[source]¶ Create a new
Pstep
having as mappings this pmod.If no
pname
specified, creates a root pstep.Delegates to
Pstep.__new__()
.
-
class
pandalone.mappings.
Pstep
[source]¶ Bases:
str
Automagically-constructed relocatable paths for accessing data-tree.
The “magic” autocreates psteps as they referenced, making writing code that access data-tree paths, natural, while at the same time the “model” of those tree-data gets discovered.
Each pstep keeps internally the name of a data-tree step, which, when created through recursive referencing, concedes with parent’s branch leading to this step. That name can be modified with
Pmod
so the same data-accessing code can refer to differently-named values int the data-tree.Variables: - _csteps (dict) – the child-psteps by their name (default
None
) - _pmod (dict) – path-modifications used to construct this and
relayed to children (default
None
) - _locked (int) – one of
-
Pstep.CAN_RELOCATE
(default), -Pstep.CAN_RENAME
, -Pstep.LOCKED
(neither from the above). - _tags (set) – A set of strings (default
()
) - _schema (dict) – json-schema data.
See
__new__()
for interal constructor.Usage:
Use a
Pmod.pstep()
to construct a root pstep from mappings. Specify a string argument to construct a relative pstep-hierarchy.Just referencing (non_private) attributes, creates them.
Private attributes and functions (starting with
_
) exist for specific operations (ie for specifying json-schema, or for collection all paths).Assignments are only allowed for string-values, or to private attributes:
>>> p = Pstep() >>> p.assignments = 12 Traceback (most recent call last): AssertionError: Cannot assign '12' to '/assignments! >>> p._but_hidden = 'Ok'
Use
_paths()
to get all defined paths so far.Construction:
>>> Pstep() `` >>> Pstep('a') `a`
Notice that pstesps are surrounded with the back-tick char(‘`’).
Paths are created implicitely as they are referenced:
>>> m = {'a': 1, 'abc': 2, 'cc': 33} >>> p = Pstep('a') >>> assert m[p] == 1 >>> assert m[p.abc] == 2 >>> assert m[p.a321.cc] == 33 >>> sorted(p._paths()) ['a/a321/cc', 'a/abc']
Any “path-mappings” or “pmods” maybe specified during construction:
>>> from pandalone.mappings import pmods_from_tuples >>> maps = [ ... ('', 'deeper/ROOT'), ... ('/abc', 'ABC'), ... ('/abc/foo', 'BAR'), ... ] >>> p = Pstep('', pmods_from_tuples(maps))
OR
>>> pmods = pmods_from_tuples(maps) >>> p = pmods.step() >>> p.abc.foo `BAR` >>> p._paths() ['deeper/ROOT/ABC/BAR']
but exceptions are thrown if mapping any step marked as “locked”:
>>> p.abc.foo._locked ## 3: CAN_RELOCATE 3
>>> p.abc.foo._lock ## Screams, because `foo` is already mapped. Traceback (most recent call last): ValueError: Cannot rename/relocate 'foo'-->'BAR' due to LOCKED!
Warning
Creating an empty(
''
) step in some paths will “root” the path:>>> p = Pstep() >>> _ = p.a1.b >>> _ = p.A2 >>> p._paths() ['/A2', '/a1/b'] >>> _ = p.a1.a2.c >>> _ = p.a1.a2 = '' >>> p._paths() ['/A2', '/a1/b', '/c']
-
static
__new__
(cls, pname=None, maps=None, alias=None, *tags)[source]¶ Constructs a string with str-content which may comes from the mappings.
These are the valid argument combinations:
pname='attr_name', pname='attr_name', _alias='Mass [kg]' pname='attr_name', maps=Pmod pname='attr_name', maps=Pstep pname='attr_name', maps=Pstep, _alias='Mass [kg]'
Parameters: - pname (str) – this pstep’s name which must coincede with the name of
the parent-pstep’s attribute holding this pstep.
It is stored at
_orig
and if noalias
and unmapped by pmod, this becomes thealias
. To create an “absolute” pstep, do not set this or alias args. - or Pstep maps (Pmod) –
It can be either:
- the mappings for this pstep,
- another pstep to clone attributes from (used when replacing an existing child-pstep), or
- None.
The mappings will apply only if
Pmod.descend()
matchpname
and will derrive the alias. - alias (str) – Will become the super-str object when no mappings specified
(
maps
is a dict from some prototype pstep) It gets jsonpointer-escaped if it exists (seepandata.escape_jsonpointer_part()
) - tags – Arguments for calling
_tag()
afterwards.
- pname (str) – this pstep’s name which must coincede with the name of
the parent-pstep’s attribute holding this pstep.
It is stored at
-
_derrive_map_tuples
()[source]¶ Recursively extract
(cmap --> alias)
pairs from the pstep-hierarchy.Parameters: Return type:
-
_fix
¶ Sets
locked
=CAN_RENAME
. :return: self :raise: ValueError if step has been relocated pstep
-
_iter_hierarchy
(prefix_steps=())[source]¶ Breadth-first traversing of pstep-hierarchy.
Parameters: prefix_steps (tuple) – Builds here branch currently visiting. Returns: yields the visited pstep along with its path (including it) Return type: (Pstep, [Pstep])
-
_lock
¶ Set
locked
=LOCKED
. :return: self, for chained use :raise: ValueError if step has been renamed/relocated pstep
-
_locked
¶ Gets
_locked
internal flag or scream on set, when step already renamed/relocatedPrefer using one of
_fix
or_lock
instead.Parameters: locked – One of CAN_RELOCATE
,CAN_RENAME
,LOCKED
.Raise: ValueError when stricter lock-value on a renamed/relocated pstep
-
_paths
(with_orig=False, tag=None)[source]¶ Return all children-paths (str-list) constructed so far, in a list.
Parameters: Return type: [str]
Examples:
>>> p = Pstep() >>> _ = p.a1._tag('inp').b._tag('inp').c >>> _ = p.a2.b2 >>> p._paths() ['/a1/b/c', '/a2/b2'] >>> p._paths(tag='inp') ['/a1', '/a1/b']
For debugging set
with_orig
toTrue
:>>> pmods = pmods_from_tuples([ ... ('', 'ROOT'), ... ('/a', 'A/AA'), ... ]) >>> p = pmods.step() >>> _ = p.a.b >>> p._paths(with_orig=True) ['(-->ROOT)/(a-->A/AA)/b']
-
_schema
¶ Updates json-schema-v4 on this pstep (see
JSchema
).
- _csteps (dict) – the child-psteps by their name (default
-
pandalone.mappings.
_append_step
(steps, step)[source]¶ Joins
step
at the right ofsteps
, respecting ‘/’, ‘..’, ‘.’, ‘’.Parameters: Return type: Note
The empty-string(‘’) is the “root” for both
steps
andstep
. An empty-tuplesteps
is considered “relative”, equivalent to dot().
Example:
>>> _append_step((), 'a') ('a',) >>> _append_step(('a', 'b'), '..') ('a',) >>> _append_step(('a', 'b'), '.') ('a', 'b')
Not that an “absolute” path has the 1st-step empty(
''
), (so the previous paths above were all “relative”):>>> _append_step(('a', 'b'), '') ('',) >>> _append_step(('',), '') ('',) >>> _append_step((), '') ('',)
Dot-dots preserve “relative” and “absolute” paths, respectively, and hence do not coalesce when at the left:
>>> _append_step(('',), '..') ('',) >>> _append_step(('',), '.') ('',) >>> _append_step(('a',), '..') () >>> _append_step((), '..') ('..',) >>> _append_step(('..',), '..') ('..', '..') >>> _append_step((), '.') ()
Single-dots(‘.’) just dissappear:
>>> _append_step(('.',), '.') () >>> _append_step(('.',), '..') ('..',)
-
pandalone.mappings.
_clone_attrs
(obj)[source]¶ Clone deeply any collection attributes of the passed-in object.
-
pandalone.mappings.
_forbidden_pstep_attrs
= ('get_values', 'Series')¶ Psteps attributes excluded from magic-creation, because searched by pandas’s indexing code.
-
pandalone.mappings.
_join_paths
(*steps)[source]¶ Joins all path-steps in a single string, respecting
'/', '..', '.', ''
.Parameters: steps (str) – single json-steps, from left to right Return type: str Note
If you use
iter_jsonpointer_parts_relaxed()
to generate path-steps, the “root” is signified by the empty(''
) step; not the slash(/
)!Hence a lone slash(
/
) gets splitted to an empty step after “root” like that:('', '')
, which generates just “root”(''
).Therefore a “folder” (i.e.
some/folder/
) when splitted equals('some', 'folder', '')
, which results again in the “root”(''
)!Examples:
>>> _join_paths('r', 'a', 'b') 'r/a/b' >>> _join_paths('', 'a', 'b', '..', 'bb', 'cc') '/a/bb/cc' >>> _join_paths('a', 'b', '.', 'c') 'a/b/c'
An empty-step “roots” the remaining path-steps:
>>> _join_paths('a', 'b', '', 'r', 'aa', 'bb') '/r/aa/bb'
All
steps
have to be already “splitted”:>>> _join_paths('a', 'b', '../bb') 'a/b/../bb'
Dot-doting preserves “relative” and “absolute” paths, respectively:
>>> _join_paths('..') '..' >>> _join_paths('a', '..') '.' >>> _join_paths('a', '..', '..', '..') '../..' >>> _join_paths('', 'a', '..', '..') ''
Some more special cases:
>>> _join_paths('..', 'a') '../a' >>> _join_paths('', '.', '..', '..') '' >>> _join_paths('.', '..') '..' >>> _join_paths('..', '.', '..') '../..'
See also
_append_step
-
pandalone.mappings.
pmods_from_tuples
(pmods_tuples)[source]¶ Turns a list of 2-tuples into a pmods hierarchy.
Each tuple defines the renaming-or-relocation of the final part of some component path onto another one into value-trees, such as:
(/rename/path, foo) --> rename/foo (relocate/path, foo/bar) --> relocate/foo/bar
The “from” path may be: - relative, - absolute(starting with
/
), or - “anywhere”(starting with//
).In case a “step” in the “from” path starts with tilda(
), it is assumed to be a regular-expression, and it is removed from it. The “to” path can make use of any “from” capture-groups:
('/~all(.*)/path', 'foo') (r'~some[\d+]/path', 'foo\1') ('//~all(.*)/path', 'foo')
Parameters: str) pmods_tuples (list(tuple(str,) – Returns: a root pmod Return type: Pmod Example:
>>> pmods_from_tuples([ ... ('/a', 'A1/A2'), ... ('/a/b', 'B'), ... ]) pmod({'': pmod({'a': pmod('A1/A2', {'b': pmod('B')})})}) >>> pmods_from_tuples([ ... ('/~a*', 'A1/A2'), ... ('/a/~b[123]', 'B'), ... ]) pmod({'': pmod({'a': pmod(OrderedDict([(re.compile('b[123]'), pmod('B'))]))}, OrderedDict([(re.compile('a*'), pmod('A1/A2'))]))})
This is how you map root:
>>> pmods = pmods_from_tuples([ ... ('', 'relative/Root'), ## Make all paths relatives. ... ('/a/b', '/Rooted/B'), ## But map `b` would be "rooted". ... ]) >>> pmods pmod({'': pmod('relative/Root', {'a': pmod({'b': pmod('/Rooted/B')})})}) >>> pmods.map_path('/a/c') 'relative/Root/a/c' >>> pmods.map_path('/a/b') '/Rooted/B'
But note that ‘/’ maps the 1st “empty-str” step after root:
>>> pmods_from_tuples([ ... ('/', 'New/Root'), ... ]) pmod({'': pmod({'': pmod('New/Root')})})
TODO: Implement “anywhere” matches.
-
pandalone.mappings.
pstep_from_df
(columns_df, name_col='names')[source]¶ Creates a
Pstep
instances from a dataframe.Parameters: columns_df (pd.DataFrame) – pstep’s mapped-names in
name_col
column, indexed by paths, and any additional pstep-attributes in the rest columns.example:
======== ========= =================== paths names renames ======== ========= =================== /A foo ['FOO', 'LL'] /B bar [] ======== ========= ===================
4.3. Module: pandalone.components
¶
Defines the building-blocks of a “model”:
- components and assemblies:
- See
Component
,FuncComponent
andAssembly
. - paths and path-mappings (pmods):
- See
Pmod
,pmods_from_tuples()
andPstep
.
4.3.1. TODO¶
- Assembly use ComponentLoader collecting components with:
gatattr()
andfilter_predicate
default toattr.__name__.startswith('cfunc_')
.- enforce a
disable
flag on them.
- Component/assembly should have a stackable or common cwd?
- Components should be easy to run without “framework”.
-
_build()
–>run()
- pmods on init ORrun()
? - As ContextManager? - Imply a default Assembly.
-
class
pandalone.components.
Assembly
(components, name=None)[source]¶ Bases:
pandalone.components.Component
Example:
>>> def cfunc_f1(comp, value_tree): ... comp.pinp().A ... comp.pout().B >>> def cfunc_f2(comp, value_tree): ... comp.pinp().B ... comp.pout().C >>> ass = Assembly(FuncComponent(cfunc) for cfunc in [cfunc_f1, cfunc_f2]) >>> ass._build() >>> assert list(ass._iter_validations()) == [] >>> ass._inp ['f1/A', 'f2/B'] >>> ass._out ['f1/B', 'f2/C']
>>> from pandalone.mappings import pmods_from_tuples
>>> pmod = pmods_from_tuples([ ... ('~.*', '/root'), ... ]) >>> ass._build(pmod) >>> sorted(ass._inp + ass._out) ['/root/A', '/root/B', '/root/B', '/root/C']
-
class
pandalone.components.
Component
(name)[source]¶ Bases:
object
Encapsulates a function and its its inputs/outputs dependencies.
It should be callable, and when executed it may read/modify the data-tree given as its 1st input.
An opportunity to fix the internal-state (i.e. inputs/output/name) is when the
_build()
is invoked.Variables: Mostly defined through cfuncs, which provide for defining a component with a single function with a special signature, see
FuncComponent
.-
__metaclass__
¶ alias of
abc.ABCMeta
-
-
class
pandalone.components.
FuncComponent
(cfunc, name=None)[source]¶ Bases:
pandalone.components.Component
Converts a “cfunc” into a component.
A cfunc is a function that modifies the values-tree with this signature:
cfunc_XXXX(comp, vtree)
where:
- comp:
- the
FuncComponent
associated with the cfunc - vtree:
- the part of the data-tree involving the values to be modified by the cfunc
It works also as a utility to developers of a cfuncs, since it is passed as their 1st arg.
The cfuncs may use
pinp()
andpout()
when accessing its input and output data-tree values respectively. Note that accessing any of those attributes from outside of cfunc, would result in an error.If a cfunc access additional values with “fixed’ paths, then it has to manually add those paths into the
_inp
and_out
lists.Example:
This would be a fully “relocatable” cfunc:
>>> def cfunc_calc_foobar_rate(comp, value_tree): ... pi = comp.pinp() ... po = comp.pout() ... ... df = value_tree.get(pi) ... ... df[po.Acc] = df[pi.V] / df[pi.T]
To get the unmodified component-paths, use:
>>> comp = FuncComponent(cfunc_calc_foobar_rate) >>> comp._build() >>> assert list(comp._iter_validations()) == [] >>> sorted(comp._inp + comp._out) ['calc_foobar_rate/Acc', 'calc_foobar_rate/T', 'calc_foobar_rate/V']
To get the path-modified component-paths, use:
>>> from pandalone.mappings import pmods_from_tuples >>> pmods = pmods_from_tuples([ ... ('~.*', '/A/B'), ... ]) >>> comp._build(pmods) >>> sorted(comp.pinp()._paths()) ['/A/B/T', '/A/B/V'] >>> comp.pout()._paths() ['/A/B/Acc'] >>> sorted(comp._inp + comp._out) ['/A/B/Acc', '/A/B/T', '/A/B/V'] >>> comp._build(pmods) >>> sorted(comp._inp + comp._out) ['/A/B/Acc', '/A/B/T', '/A/B/V']
4.4. Module: pandalone.pandata
¶
A pandas-model is a tree of strings, numbers, sequences, dicts, pandas instances and resolvable
URI-references, implemented by Pandel
.
-
class
pandalone.pandata.
JSONCodec
[source]¶ Bases:
object
Json coders/decoders capable for (almost) all python objects, by pickling them.
Example:
>>> import json >>> obj_list = [ ... 3.14, ... { ... 'aa': pd.DataFrame([]), ... 2: np.array([]), ... 33: {'foo': 'bar'}, ... }, ... pd.DataFrame(np.random.randn(10, 2)), ... ('b', pd.Series({})), ... ] >>> for o in obj_list + [obj_list]: ... s = json.dumps(o, cls=JSONCodec.Encoder) ... oo = json.loads(s, cls=JSONCodec.Decoder) ... #assert trees_equal(o, oo) ...
See also
For pickle-limitations: https://docs.python.org/3.7/library/pickle.html#pickle-picklable
-
class
Decoder
(*, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, strict=True, object_pairs_hook=None)[source]¶ Bases:
json.decoder.JSONDecoder
-
class
-
class
pandalone.pandata.
JSchema
[source]¶ Bases:
object
Facilitates the construction of json-schema-v4 nodes on
PStep
code.It does just rudimentary args-name check. Further validations should apply using a proper json-schema validator.
Parameters: - type – if omitted, derived as ‘object’ if it has children
- kws – for all the rest see http://json-schema.org/latest/json-schema-validation.html
-
class
pandalone.pandata.
ModelOperations
[source]¶ Bases:
pandalone.pandata.ModelOperations
Customization functions for traversing, I/O, and converting self-or-descendant branch (sub)model values.
-
static
__new__
(cls, inp=None, out=None, conv=None)[source]¶ Parameters: - inp (list) – the
args-list
toPandel._read_branch()
- out –
The args to
Pandel._write_branch()
, that may be specified either as:- an
args-list
, that will apply for all model data-types (lists, dicts & pandas), - a map of
type
–>args-list
, where theNone
key is the catch-all case, - a function returning the
args-list
for some branch-value, with signature:def get_write_branch_args(branch)
.
- an
- conv –
The conversion-functions (convertors) for the various model’s data-types. The convertors have signature
def convert(branch)
, and they may be specified either as:- a map of
(from_type, to_type)
–>conversion_func()
, where theNone
key is the catch-all case, - a “master-switch” function returning the appropriate convertor
depending on the requested conversion.
The master-function’s signature is
def get_convertor(from_branch, to_branch)
.
The minimum convertors demanded by
Pandel
are (at least, check the code for more):- DataFrame <–> dict
- Series <–> dict
- ndarray <–> list
- a map of
- inp (list) – the
-
static
-
class
pandalone.pandata.
Pandel
(curate_funcs=())[source]¶ Bases:
object
Builds, validates and stores a pandas-model, a mergeable stack of JSON-schema abiding trees of strings and numbers, assembled with
- sequences,
- dictionaries,
pandas.DataFrame
,pandas.Series
, and- URI-references to other model-trees.
Overview
The making of a model involves, among others, schema-validating, reading subtree-branches from URIs, cloning, converting and merging multiple sub-models in a single unified-model tree, without side-effecting given input. All these happen in 4+1 steps:
....................... Model Construction ................. ------------ : _______ ___________ : / top_model /==>|Resolve|->|PreValidate|-+ : -----------' : |___0___| |_____1_____| | : ------------ : _______ ___________ | _____ ________ ______ : -------- / base-model/==>|Resolve|->|PreValidate|-+->|Merge|->|Validate|->|Curate|==>/ model / -----------' : |___0___| |_____1_____| |_ 2__| |___3____| |__4+__|: -------' ............................................................
All steps are executed “lazily” using generators (with
yield
). Before proceeding to the next step, the previous one must have completed successfully. That way, any ad-hoc code in building-step-5(curation), for instance, will not suffer a horrible death due to badly-formed data.[TODO] The storing of a model simply involves distributing model parts into different files and/or formats, again without side-effecting the unified-model.
Building model
Here is a detailed description of each building-step:
_resolve()
and substitute any json-references present in the submodels with content-fragments fetched from the referred URIs. The submodels are cloned first, to avoid side-effecting them.Although by default a combination of JSON and CSV files is expected, this can be customized, either by the content in the json-ref, within the model (see below), or as explained below.
The extended json-refs syntax supported provides for passing arguments into
_read_branch()
and_write_branch()
methods. The syntax is easier to explain by showing what the default_global_cntxt
corresponds to, for aDataFrame
:{ "$ref": "http://example.com/example.json#/foo/bar", "$inp": ["AUTO"], "$out": ["CSV", "encoding=UTF-8"] }
And here what is required to read and (later) store into a HDF5 local file with a predefined name:
{ "$ref": "file://./filename.hdf5", "$inp": ["AUTO"], "$out": ["HDF5"] }
Warning
Step NOT IMPLEMENTED YET!
Loosely
_prevalidate()
each sub-model separately with json-schema, where any pandas-instances (DataFrames and Series) are left as is. It is the duty of the developer to ensure that the prevalidation-schema is loose enough that it allows for various submodel-forms, prior to merging, to pass.Recursively clone and
_merge()
sub-models in a single unified-model tree. Branches from sub-models higher in the stack override the respective ones from the sub-models below, recursively. Different object types need to be converted appropriately (ie. merging adict
with aDataFrame
results into aDataFrame
, so the dictionary has to convert to dataframe).The required conversions into pandas classes can be customized as explained below. Series and DataFrames cannot merge together, and Sequences do not merge with any other object-type (themselfs included), they just “overwrite”.
The default convertor-functions defined both for submodels and models are listed in the following table:
From: To: Method: dict DataFrame pd.DataFrame
(the constructor)DataFrame dict lambda df: df.to_dict('list')
dict Series pd.Series
(the constructor)Series dict lambda sr: sr.to_dict()
Strictly json-
_validate()
the unified-model (ie enforcingrequired
schema-rules).The required conversions from pandas classes can be customized as explained below.
The default convertor-functions are the same as above.
(Optionally) Apply the
_curate()
functions on the the model to enforce dependencies and/or any ad-hoc generation-rules among the data. You can think of bash-like expansion patterns, like${/some/path:=$HOME}
or expressions like%len(../other/path)
.
Storing model
When storing model-parts, if unspecified, the filenames to write into will be deduced from the jsonpointer-path of the
$out
’s parent, by substituting “strange” chars with undescores(_
).Warning
Functionality NOT IMPLEMENTED YET!
Customization
Some operations within steps (namely conversion and IO) can be customized by the following means (from lower to higher precedance):
The global-default
ModelOperations
instance on the_global_cntxt
, applied on both submodels and unified-model.For example to channel the whole reading/writing of models through HDF5 data-format, it would suffice to modify the
_global_cntxt
like that:pm = FooPandelModel() ## some concrete model-maker io_args = ["HDF5"] pm.mod_global_operations(inp=io_args, out=io_args)
[TODO] Extra-properties on the json-schema applied on both submodels and unified-model for the specific path defined. The supported properties are the non-functional properties of
ModelOperations
.
- Specific-properties regarding IO operations within each submodel - see the resolve building-step, above.
Context-maps of
json_paths
–>ModelOperations
instances, installed byadd_submodel()
andunified_contexts
on the model-maker. They apply to self-or-descedant subtree of each model.The
json_path
is a strings obeying a simplified json-pointer syntax (no char-normalizations yet), ie/some/foo/1/pointer
. An empty-string(''
) matches all model.When multiple convertors match for a model-value, the selected convertor to be used is the most specific one (the one with longest prefix). For instance, on the model:
[ { "foo": { "bar": 0 } } ]
all of the following would match the
0
value:- the global-default
_global_cntxt
, /
, and/0/foo
but only the last’s context-props will be applied.
- the global-default
Atributes
-
model
¶ The model-tree that will receive the merged submodels after
build()
has been invoked. Depending on the submodels, the top-value can be any of the supported model data-types.
-
_submodel_tuples
¶ The stack of (
submodel
,path_ops
) tuples. The list’s 1st element is the base-model, the last one, the top-model. Use theadd_submodel()
to build this list.
-
_global_cntxt
¶ A
ModelOperations
instance acting as the global-default context for the unified-model and all submodels. Usemod_global_operations()
to modify it.
-
_curate_funcs
¶ The sequence of curate functions to be executed as the final step by
_curate()
. They are “normal” functions (not generators) with signature:def curate_func(model_maker): pass ## ie: modify ``model_maker.model``.
Better specify this list of functions on construction time.
-
_errored
¶ An internal boolean flag that becomes
True
if any build-step has failed, to halt proceeding to the next one. It isNone
if build has not started yet.
Examples
The basic usage requires to subclass your own model-maker, just so that a json-schema is provided for both validation-steps, 2 & 4:
>>> from collections import OrderedDict as od ## Json is better with stable keys-order
>>> class MyModel(Pandel): ... def _get_json_schema(self, is_prevalidation): ... return { ## Define the json-schema. ... '$schema': 'http://json-schema.org/draft-04/schema#', ... 'required': [] if is_prevalidation else ['a', 'b'], ## Prevalidation is more loose. ... 'properties': { ... 'a': {'type': 'string'}, ... 'b': {'type': 'number'}, ... 'c': {'type': 'number'}, ... } ... }
Then you can instanciate it and add your submodels:
>>> mm = MyModel() >>> mm.add_submodel(od(a='foo', b=1)) ## submodel-1 (base) >>> mm.add_submodel(pd.Series(od(a='bar', c=2))) ## submodel-2 (top-model)
You then have to build the final unified-model (any validation errors would be reported at this point):
>>> mdl = mm.build()
Note that you can also access the unified-model in the
model
attribute. You can now interogate it:>>> mdl['a'] == 'bar' ## Value overridden by top-model True >>> mdl['b'] == 1 ## Value left intact from base-model True >>> mdl['c'] == 2 ## New value from top-model True
Lets try to build with invalid submodels:
>>> mm = MyModel() >>> mm.add_submodel({'a': 1}) ## According to the schema, this should have been a string, >>> mm.add_submodel({'b': 'string'}) ## and this one, a number.
>>> sorted(mm.build_iter(), key=lambda ex: ex.message) ## Fetch a list with all validation errors. # doctest: +NORMALIZE_WHITESPACE [<ValidationError: "'string' is not of type 'number'">, <ValidationError: "1 is not of type 'string'">, <ValidationError: 'Gave-up building model after step 1.prevalidate (out of 4).'>]
>>> mdl = mm.model >>> mdl is None ## No model constructed, failed before merging. True
And lets try to build with valid submodels but invalid merged-one:
>>> mm = MyModel() >>> mm.add_submodel({'a': 'a str'}) >>> mm.add_submodel({'c': 1})
>>> sorted(mm.build_iter(), key=lambda ex: ex.message) # doctest: +NORMALIZE_WHITESPACE [<ValidationError: "'b' is a required property">, <ValidationError: 'Gave-up building model after step 3.validate (out of 4).'>]
-
__init__
(curate_funcs=())[source]¶ Parameters: curate_funcs (sequence) – See _curate_funcs
.
-
__metaclass__
¶ alias of
abc.ABCMeta
-
_curate
()[source]¶ Step-4: Invokes any curate-functions found in
_curate_funcs
.
-
_get_json_schema
(is_prevalidation)[source]¶ Returns: a json schema, more loose when prevalidation
for each caseReturn type: dictionary
-
_select_context
(path, branch)[source]¶ Finds which context to use while visiting model-nodes, by enforcing the precedance-rules described in the Customizations.
Parameters: Returns: the selected
ModelOperations
-
add_submodel
(model, path_ops=None)[source]¶ Pushes on top a submodel, along with its context-map.
Parameters: - model – the model-tree (sequence, mapping, pandas-types)
- path_ops (dict) – A map of
json_paths
–>ModelOperations
instances acting on the unified-model. Thepath_ops
may often be empty.
Examples
To change the default DataFrame –> dictionary convertor for a submodel, use the following:
>>> mdl = {'foo': 'bar'} >>> submdl = ModelOperations(mdl, conv={(pd.DataFrame, dict): lambda df: df.to_dict('record')})
-
build
()[source]¶ Attempts to build the model by exhausting
build_iter()
, or raises its 1st error.Use this method when you do not want to waste time getting the full list of errors.
-
build_iter
()[source]¶ Iteratively build model, yielding any problems as
ValidationError
instances.For debugging, the unified model at
model
my contain intermediate results at any time, even if construction has failed. Check the_errored
flag if neccessary.
-
mod_global_operations
(operations=None, **cntxt_kwargs)[source]¶ Since it is the fall-back operation for conversions and IO operation, it must exist and have all its props well-defined for the class to work correctly.
Parameters: - operations (ModelOperations) – Replaces values of the installed context with non-empty values from this one.
- cntxt_kwargs – Replaces the keyworded-values on the existing
operations
. SeeModelOperations
for supported keywords.
-
unified_contexts
¶ A map of
json_paths
–>ModelOperations
instances acting on the unified-model.
-
pandalone.pandata.
PandelVisitor
(schema, resolver=None, format_checker=None, auto_default: Optional[bool] = True, auto_default_nulls: Optional[bool] = False, auto_remove_nulls: Optional[bool] = False)[source]¶ A customized jsonschema-validator suporting instance-trees with pandas and numpy objects, natively.
Parameters: - auto_default –
When the tri-state bool
autoDefault
in schema or this param are enabled, it applies any schema’sdefault
value if a property is missing and schema’stype
does not supportnulls
.- Independent of
auto_default_nulls
(you may enable both). - See meth:
_rule_auto_defaults_properties
.
- Independent of
- auto_default_nulls –
When the tri-state bool
autoDefaultNull
in schema or this param are it applies any schema’sdefault
value if the property isnull
and schema’stype
does not supportnulls
.- Independent of
auto_default
(you may enable both). - Take precedence over
auto_remove_nulls
. - See meth:
_rule_auto_defaults_properties
.
- Independent of
- auto_remove_nulls –
When the tri-state bool
autoRemoveNull
in schema or this param are it removes anull
property value if the schema’stype
does not acceptnulls
.- See meth:
_rule_auto_defaults_properties
.
Attention
If this is enabled, any
required
properties rule must FOLLOW theproperties
rule. - See meth:
Any pandas or numpy instance (for example
obj
) is treated like that:Python Type JSON Equivalence pandas.DataFrame
as
object
json-type, with: keys:obj.columns
(MUST be strings) values:obj[col].values
NOTE: len(df) on rows(!), not columns.
pandas.Series
- as
object
json-type, with: keys:obj.index
(MUST be strings) values:obj.values
- as
array
json-type
np.ndarray
as array
json-type IF ndim == 1cabc.Sequence
as array
IF not string (like lists, tuples)Note that the value of each dataFrame column is a :
ndarray
instances.The simplest validations of an object or a pandas-instance is like this:
>>> import pandas as pd
>>> schema = { ... 'type': 'object', ... } >>> pv = PandelVisitor(schema)
>>> pv.validate({'foo': 'bar'}) >>> pv.validate(pd.Series({'foo': 1})) >>> pv.validate([1,2]) ## A sequence is invalid here. Traceback (most recent call last): ... jsonschema.exceptions.ValidationError: [1, 2] is not of type 'object' <BLANKLINE> Failed validating 'type' in schema: {'type': 'object'} <BLANKLINE> On instance: [1, 2]
Or demanding specific properties with
required
and noadditionalProperties
:>>> schema = { ... 'type': 'object', ... 'properties': { ... 'foo': {} ... }, ... 'required': ['foo'], ... 'additionalProperties': False, ... } >>> pv = PandelVisitor(schema)
>>> pv.validate(pd.Series({'foo': 1})) >>> pv.validate(pd.Series({'foo': 1, 'bar': 2})) ## Additional 'bar' is present! Traceback (most recent call last): ... jsonschema.exceptions.ValidationError: Additional properties are not allowed ('bar' was unexpected) <BLANKLINE> Failed validating 'additionalProperties' in schema: {'additionalProperties': False, 'properties': {'foo': {}}, 'required': ['foo'], 'type': 'object'} <BLANKLINE> On instance: foo 1 bar 2 dtype: int64
>>> pv.validate(pd.Series({})) ## Required 'foo' missing! Traceback (most recent call last): ... jsonschema.exceptions.ValidationError: 'foo' is a required property <BLANKLINE> Failed validating 'required' in schema: {'additionalProperties': False, 'properties': {'foo': {}}, 'required': ['foo'], 'type': 'object'} <BLANKLINE> On instance: Series([], dtype: float64)
- auto_default –
-
pandalone.pandata.
_U
¶ alias of
pandalone.pandata.United
-
pandalone.pandata.
_find_additional_properties
(instance, schema)[source]¶ Return the set of additional properties for the given
instance
.Weeds out properties that should have been validated by
properties
and / orpatternProperties
.Assumes
instance
is dict-like already.
-
pandalone.pandata.
_rule_auto_defaults_properties
(validator, properties, instance, schema, original_props_rule, auto_default, auto_default_nulls, auto_remove_nulls)[source]¶ Adapted from: https://python-jsonschema.readthedocs.io/en/stable/faq/#frequently-asked-questions
-
pandalone.pandata.
_units_cleaner_regex
= re.compile('^[<[]|[\\]>]$')¶ An item-descriptor with units, i.e. used as a table-column header.
-
pandalone.pandata.
first_defined
(*var, default=None)[source]¶ Return the 1st non-none
var
, ordefault
.
-
pandalone.pandata.
iter_jsonpointer_parts
(jsonpath)[source]¶ Generates the
jsonpath
parts according to jsonpointer spec.Parameters: jsonpath (str) – a jsonpath to resolve within document Returns: The parts of the path as generator), without converting any step to int, and None if None. Author: Julian Berman, ankostis Examples:
>>> list(iter_jsonpointer_parts('/a/b')) ['a', 'b'] >>> list(iter_jsonpointer_parts('/a//b')) ['a', '', 'b'] >>> list(iter_jsonpointer_parts('/')) [''] >>> list(iter_jsonpointer_parts('')) []
But paths are strings begining (NOT_MPL: but not ending) with slash(‘/’):
>>> list(iter_jsonpointer_parts(None)) Traceback (most recent call last): AttributeError: 'NoneType' object has no attribute 'split' >>> list(iter_jsonpointer_parts('a')) Traceback (most recent call last): jsonschema.exceptions.RefResolutionError: Jsonpointer-path(a) must start with '/'! #>>> list(iter_jsonpointer_parts('/a/')) #Traceback (most recent call last): #jsonschema.exceptions.RefResolutionError: Jsonpointer-path(a) must NOT ends with '/'!
-
pandalone.pandata.
iter_jsonpointer_parts_relaxed
(jsonpointer)[source]¶ Like
iter_jsonpointer_parts()
but accepting also non-absolute paths.The 1st step of absolute-paths is always ‘’.
Examples:
>>> list(iter_jsonpointer_parts_relaxed('a')) ['a'] >>> list(iter_jsonpointer_parts_relaxed('a/')) ['a', ''] >>> list(iter_jsonpointer_parts_relaxed('a/b')) ['a', 'b'] >>> list(iter_jsonpointer_parts_relaxed('/a')) ['', 'a'] >>> list(iter_jsonpointer_parts_relaxed('/a/')) ['', 'a', ''] >>> list(iter_jsonpointer_parts_relaxed('/')) ['', ''] >>> list(iter_jsonpointer_parts_relaxed('')) ['']
-
pandalone.pandata.
parse_value_with_units
(arg)[source]¶ Parses name-units pairs (i.e. used as a table-column header).
Returns: a United(name, units) named-tuple, or None
if bad syntax; note thatname=''
butunits=None
when missing.Examples:
>>> parse_value_with_units('value [units]') United(name='value', units='units') >>> parse_value_with_units('foo bar <bar/krow>') United(name='foo bar', units='bar/krow') >>> parse_value_with_units('no units') United(name='no units', units=None) >>> parse_value_with_units('') United(name='', units=None)
But notice:
>>> assert parse_value_with_units('ok but [bad units') is None >>> parse_value_with_units('<only units>') United(name='', units='only units') >>> parse_value_with_units(None) Traceback (most recent call last): TypeError: expected string or ...
-
pandalone.pandata.
resolve_jsonpointer
(doc, jsonpointer, default=<object object>)[source]¶ Resolve a
jsonpointer
within the referenceddoc
.Parameters: - doc – the referrant document
- path (str) – a jsonpointer to resolve within document
- default – A value to return if path does not resolve.
Returns: the resolved doc-item or raises
RefResolutionError
Raises: RefResolutionError (if cannot resolve path and no
default
)Examples:
>>> dt = { ... 'pi':3.14, ... 'foo':'bar', ... 'df': pd.DataFrame(np.ones((3,2)), columns=list('VN')), ... 'sub': { ... 'sr': pd.Series({'abc':'def'}), ... } ... } >>> resolve_jsonpointer(dt, '/pi', default=_scream) 3.14
>>> resolve_jsonpointer(dt, '/pi/BAD') Traceback (most recent call last): jsonschema.exceptions.RefResolutionError: Unresolvable JSON pointer('/pi/BAD')@(BAD)
>>> resolve_jsonpointer(dt, '/pi/BAD', 'Hi!') 'Hi!'
Author: Julian Berman, ankostis
-
pandalone.pandata.
resolve_path
(doc, path, default=<object object>, root=None)[source]¶ Like
resolve_jsonpointer()
also for relative-paths & attribute-branches.Parameters: - doc – the referrant document
- path (str) – An abdolute or relative path to resolve within document.
- default – A value to return if path does not resolve.
- root – Document for absolute paths, assumed
doc
if missing.
Returns: the resolved doc-item or raises
RefResolutionError
Raises: RefResolutionError (if cannot resolve path and no
default
)Examples:
>>> dt = { ... 'pi':3.14, ... 'foo':'bar', ... 'df': pd.DataFrame(np.ones((3,2)), columns=list('VN')), ... 'sub': { ... 'sr': pd.Series({'abc':'def'}), ... } ... } >>> resolve_path(dt, '/pi', default=_scream) 3.14
>>> resolve_path(dt, 'df/V') 0 1.0 1 1.0 2 1.0 Name: V, dtype: float64
>>> resolve_path(dt, '/pi/BAD', 'Hi!') 'Hi!'
Author: Julian Berman, ankostis
-
pandalone.pandata.
rule_enum
(validator, enums, instance, schema)[source]¶ Overridden to evade pandas-equals after Julian/jsonschema#575 fixed bool != 0,1 (v3.0.2).
-
pandalone.pandata.
set_jsonpointer
(doc, jsonpointer, value, object_factory=<class 'collections.OrderedDict'>)[source]¶ Resolve a
jsonpointer
within the referenceddoc
.Parameters: - doc – the referrant document
- jsonpointer (str) – a jsonpointer to the node to modify
Raises: RefResolutionError (if jsonpointer empty, missing, invalid-contet)