Štefan Urbánek
Mentor, data infrastructure architect, data governance practitioner, software engineer, independent researcher.
In the domain of data warehousing and data governance for over 18 years, as an architect, software engineer and consultant. Some of the companies I worked for: Facebook, Squarespace, Anaconda, Orange. I have advised and worked for various non-governmental organizations, such as Open Knowledge Foundation to introduce industry practices into the space of open government data.
I am an open-source creator and contributor, conference speaker and a creator and lead developer of open-source data warehouse toolkit Data Brewery and it’s flagship multi-dimensional analytical server Cubes.
Artifacts: Github, Slideshare, Speakerdeck, Scribd.
Services
My core expertise is in business intelligence and data warehousing. I consult and assist in the areas of data warehouse architecture, data governance, data quality.
- designing and reviewing data infrastructure
- designing data quality management infrastructure
- designing data governance processes
- designing unified sources of truth - “master data”
- decomissioning of legacy or burdensome systems
- providing data engineering training
I prefer to work with organisations together, while transfering the knowledge to their engineers for self-sufficiency.
Non-professional
Non-professionally doing independent research in the cross-disciplinary field of complex systems and their knowledge codification, unconventional computing and non-linear languages.
Expertise
Engineering Specialisation
- Data infrastructure architecture
- Systems for data governance and data quality management
- Metadata systems, metadata-based processing and metadata modelling
- Relational algebra
- Design of domain specific languages
Values: System’s adaptability, ecosystem’s technology agnosticism and transparency of data quality.
Independent Research
- Researching possibilities and engineering architecture of an ecosystem to develop an universal body of knowledge about complex systems
- Combining discoveries in bio-chemistry, for the purpose of conceptual complex systems simulation
- Design and development of domain specific languages and compilers
Work
- Facebook: Tech lead for data warehouse architecture for revenue data streams. Designed and developed declarative and technology-agnostic ETL/data framework to develop metadata-driven pipelines for multi-dimensional metrics.
- Squarespace: Tech lead for data warehouse migration, designed a new warehouse architecture. Introduced and integrated metadata driven OLAP server into the data ecosystem to assure reporting consistency.
- Knowerce: Founder of data consultancy serving multinational clientele (Anaconda, Raiffeisen, Open Knowledge International, Pfizer, Transparency International, …)
- Orange Slovakia: Customer intelligence systems, datamart design and development, data integration.
Non-governmental Organisations:
- Open Knowledge International: Defined a foundation of data processing pipeline concepts for School of Data.
- Open Knowledge Labs: providing business intelligence and data warehousing expertise, helping to form open-data standards.
- Transparency International Slovakia: Built very first analytical open-data business intelligence portal in Slovakia (and CEE region) for Open Public Procurements.
- Fair Play Alliance Slovakia: Developed first open data portal with data quality management elements in Slovakia.
Open-Source Portfolio
Cubes
A multi-dimensional conceptual data framework and server. The main features are OLAP and aggregated browsing with default relational database, multi-dimensional analysis, logical view of analysed data. The purpose was to focus on how analysts look at data, how they think of data, not not how the data are physically implemented in the data stores, hierarchical conceptual dimensions. The framework is SQL-dialect agnostic and uses relational-algebra with dialect-specific compilers to generate concrete database queries.
Links: Project Home, Github sources, Documentation
Bubbles
An experimental Python framework for data processing and data quality measurement. Basic concept are abstract data objects, operations and dynamic operation dispatch.
Links: Project Home, Github sources, Documentation
Expressions
Small utility library for embedding arithmetic expressions parser and compiler into other libraries and applications.
Links: Github (contains documentation)
Step Talk
Smalltalk implementation on top of Objective C runtime. Used as a scripting framework for creating scriptable servers or applications. StepTalk, when combined with the dynamism that the Objective-C language provides, goes way beyond mere scripting. It is written using GNUstep.
Links: Github
AgentFarms
Was a toolkit for multi-agent based simulations written in ObjectiveC. Featured iterative simulator, simulation server, data probing and collecting mechanism and virtual laboratory application (Farmer) to control and visualize the simulation.
Links: Github
Other minor libraries from the past:
- XY - two-dimensional plotting in ObjectiveC/OpenStep/early Cocoa
- Develpment Kit - ObjectiveC source code generator
- Various contributions to the GNUstep project.
Independent Research
Sepro
Biochemsitry inspired programming language. Experimental research project.
Links: Slides, Document, Github
Conference Talks
- Data Natives 2019, Berlin: Forces and Threats in a Data Warehouse or why Architecture and Metadata Matters. (slides)
- PyData NYC 2014: Panel: Python in Business Intelligence
- PyData NYC 2014: Cubes 1.0 (Python OLAP) – new features (video, slides)
- Transparency Camp 2014, Washington DC, USA – Open-source OLAP
- Data Harvest 2014, Brussels, – panel Journos and Codes cooperating, talk Lessons from Business Intelligence for Open Data, talk Data Governance – why and how?,
- PyCon 2014, Montreal – Cubes – Distributed Data Warehouse (Scribd document/pdf)
- PyData 2012, New York – talk Python in Business Intelligence (video, slides); lightning talk Cubes - Lightweight OLAP (video); lightning talk PyData Academy (video, slides)
- PyTexas 2012, College Station, TX – lightning talk – Cubes OLAP
- Data Harvest, May 2012, Brussels – Open Public Procurements of Slovakia
- EuroPython 2012, Florence, Italy – talk and training: Cubes – lightweight Python OLAP (video, slides)
- BigClean 2011, Prague, Czech Republic – Open Data Data Quality (slides)
- Transparency Camp 2011, Washington DC, USA – Slovak Open Public Procurements; Cubes - open-source OLAP
- Open Knowledge Conference 2010, London, UK – Screen-scraping Slovak Public Procurements
- Transparency Camp 2010, Washington DC, USA – Data Camp and Data Camp ETL – first Slovak Open Data projects
- E-Democracy 2009, Berlin, Germany, Open Data in Slovakia and Data Camp - a Data publishing application
- Znalosti 2004 (Knowledge 2004) – “The Trust – Evolutionary Simulation and Modelling”, February 2004
- ESUG 2003 – Smalltalk Conference – “StepTalk”, August 2003
- Cognition, artificial Life and Computer Intelligence, Stará Lesná, Vysoké Tatry. “Learning of a System Using Simulation with Minimal Assumptions”, May 2003