R | Daniel Antal

Create Datasets that are Easy to Combine and Reuse

Tue, 22 Nov 2022 09:09:00 +0100

The latest Reprex R package, dataset was released today on the Comprehensive R Archive Network. It is a very early, conceptual package that will help make scientific achievements more open, governmental data easier to find, and store information that can be better combined.

Data interoperability is almost a buzzword, yet we see very few comprehensive, good solutions to apply it. Try to find information on open government portals or on big open science repositories—apart from a few good examples, most datasets are as disorganized as any PC’s hard disk that is collecting dust in a shed.

The dataset package aims to bring together the best practices of data semantics, data organization, and the use of standard metadata to make sure that whatever you store in a data table, it will be immediately available for data analysis, activation, or combination in any new database.

Ambitious? It is, and dataset 0.1.9 is a very experimental product. While our other packages are aimed at intermediate users with a clear use case in mind, dataset at this point is aimed at package developers. Casual or even heavy R users are unlikely to download it as a standalone product. Instead, dataset aims to be a stable developer basis for our existing products, rOpenGov packages, and many new uses.

Download dataset

The metadata aim of dataset it to add standardized metadata to r data.frames, tibbles, data.tables and other similar structured, tabular objects. The organization and semantic objectives are to bring the tidy data concept closer to the datacube model, which is the basis of all statistical data exchanges, and W3C standards, which foster machine-to-machine data communications on the traditional web APIs and the semantic web.

Makes data importing easier and less error-prone;
Leaves plenty of room for documentation automation, resulting in far better reusability and reproducibility;
The publication of results from R following the FAIR principles is far easier, making the work of the R user more findable, more accessible, more interoperable and more reusable by other users;
Makes the placement into relational databases, semantic web applications, archives, repositories possible without time-consuming and costly data wrangling (See From dataset To RDF).

The first official release offers little immediate benefits. However, if you are an R package developer, we can bring you a few steps nearer to releasing your data products in a way that conforms the FAIR metadata principles. We can make a few steps to streamline your data wrangling. Make integration with relational databases easier. To make a step towards the semantic web.

Learn R with Reprex

Fri, 07 Oct 2022 12:35:00 +0200

Big Data Creates Inequalities

Only the largest corporations, best-endowed universities, and rich governments can afford data collection and processing capacities that are large enough to harness the advantages of AI.

Fullscreen: F

Next: ️> or Space | Previous :️<
Start: Home | Finish: End
Overview: Esc| Speaker notes: S
Zoom: Alt + Click 🖱️

Big data that works for all

No matter how big is the problem or how small is your team, `Reprex` fill your reports, dashboards, newsletters, books with data and its visualization.
Learn R with us: you can reduce the inequalities by joining the open source movement, learning to run open source software, ask for help, improve the tutorials, the documentation, and eventually learn to make the computer work for you.
Contributor Covenant: Participating in open source is often a highly collaborative experience. We’re encouraged to create in public view, and we’re incentivized to welcome contributions of all kinds from people around the world. This makes the practice of open source as much social as it is technical.

Get Inspired

Find more interesting and better data: you don’t have to be a data scientist or write code to contribute to our projects.
Data feminism: Catherine D’Ignazio and Lauren Klein present a new way of thinking about data science and data ethics—one that is informed by intersectional feminist thought. Highly inspirational, free, open-source book.
RLadies is a world-wide organization to promote gender diversity in the R community.

Contributor Covenant

We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.

Run code from tutorials

retroharmonize.dataobservatory.eu
🖱 Get started
[🖱️ Articles](https://retroharmonize.dataobservatory.eu/articles/index.htm

Find help, ask for help: reprex

Documentation for better tutorials

Debugging and testing code

Contribute to documentation

R is a functional language

R is both a statistical environment and a programming language
R, the open source and further developed version of the S language, is mainly functional
If you did a task at least twice, the 3rd time you better write a function script to keep doing it forever.
Most of your effort will be to find a well-written function for your work
If you cannot find a function, you will modify somebody else’s function, or eventually write your own

R + YAML + markdown = web ready

Learn YAML in Y minutes: tell the computer what you want to do with a document
R Markdown basics: it is just a plain markdown that allows you to insert little R program ‘chunks’.
Awesome markdown editors and pre-writers: find a convenient tool
Google Docs to markdown: practice by translating your Google Docs text to markdown. It is very easy.

Package and release: a team effort

Our open source development projects

🔢 dataset: Synchronize datasets with global knowledge hubs #️⃣ statcodelists: Make your data codes understood globally ♻️ iotables: Create economic or environmental impact assessments in any EU country 🌍 regions: Create from raw survey data more granular statistics in any EU country ✅ retroharmonize: Harmonize questions banks, recycle answers from past surveys ⏭️ all in on one page

Create with us

Questions?

Email | Keybase

LinkedIn: Daniel Antal - Reprex | Home

Learn R with Reprex

Fri, 07 Oct 2022 12:35:00 +0200

Big Data Creates Inequalities

Only the largest corporations, best-endowed universities, and rich governments can afford data collection and processing capacities that are large enough to harness the advantages of AI.

Fullscreen: F

Next: ️> or Space | Previous :️<
Start: Home | Finish: End
Overview: Esc| Speaker notes: S
Zoom: Alt + Click 🖱️

Big data that works for all

No matter how big is the problem or how small is your team, `Reprex` fill your reports, dashboards, newsletters, books with data and its visualization.
Learn R with us: you can reduce the inequalities by joining the open source movement, learning to run open source software, ask for help, improve the tutorials, the documentation, and eventually learn to make the computer work for you.
Contributor Covenant: Participating in open source is often a highly collaborative experience. We’re encouraged to create in public view, and we’re incentivized to welcome contributions of all kinds from people around the world. This makes the practice of open source as much social as it is technical.

Data Feminism

Get Inspired

Find more interesting and better data: you don’t have to be a data scientist or write code to contribute to our projects.
Data feminism: Catherine D’Ignazio and Lauren Klein present a new way of thinking about data science and data ethics—one that is informed by intersectional feminist thought. Highly inspirational, free, open-source book.
RLadies is a world-wide organization to promote gender diversity in the R community.

Contributor Covenant

We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.

Run code from tutorials

retroharmonize.dataobservatory.eu
🖱 Get started
🖱️ Articles

Find help, ask for help: reprex

Documentation for better tutorials

Debugging and testing code

Contribute to documentation

R is a functional language

R is both a statistical environment and a programming language
R, the open source and further developed version of the S language, is mainly functional
If you did a task at least twice, the 3rd time you better write a function script to keep doing it forever.
Most of your effort will be to find a well-written function for your work
If you cannot find a function, you will modify somebody else’s function, or eventually write your own

R + YAML + markdown = web ready

Learn YAML in Y minutes: tell the computer what you want to do with a document
R Markdown basics: it is just a plain markdown that allows you to insert little R program ‘chunks’.
Awesome markdown editors and pre-writers: find a convenient tool
Google Docs to markdown: practice by translating your Google Docs text to markdown. It is very easy.

Package and release: a team effort

Our open source development projects

Create with us

Questions?

Email | Keybase

LinkedIn: Daniel Antal - Reprex | Home

Creating Algorithmic Tools to Interpret and Communicate Open Data Efficiently

Fri, 04 Jun 2021 10:00:00 +0000

As a developer at rOpenGov, what type of data do you usually use in your work?

As an academic data scientist whose research focuses on the development of general-purpose algorithmic methods, I work with a range of applications from life sciences to humanities. Population studies play a big role in our research, and often the information that we can draw from public sources - geospatial, demographic, environmental - provides invaluable support. We typically use open data in combination with sensitive research data but some of the research questions can be readily addressed based on open data from statistical authorities such as Statistics Finland or Eurostat.

In your ideal data world, what would be the ultimate dataset, or datasets that you would like to see in the Music Data Observatory?

One line of our research analyses the historical trends and spread of knowledge production, in particular book printing based on large-scale metadata collections. It would be interesting to extend this research to music, to understand the contemporary trends as well as the broader historical developments. Gaining access to a large systematic collection of music and composition data from different countries across long periods of time would make this possible.

Why did you decide to join the challenge and why do you think that this would be a game changer for researchers and policymakers?

Joining the challenge was a natural development based on our overall activities in this area; the rOpenGov project has been around for a decade now, since the early days of the broader open data movement. This has also created an active international developer network and we felt well equipped for picking up the challenge. The game changer for researchers is that the project highlights the importance of data quality, even when dealing with official statistics, and provides new methods to solve these issues efficiently through the open collaboration model. For policymakers, this provides access to new high-quality curated data and case studies that can support evidence-based decision-making.

Do you have a favorite, or most used open governmental or open science data source? What do you think about it? Could it be improved?

Regarding open government data, one of my favorites is not a single data source but a data representation standard. The px format is widely used by statistical authorities in various countries, and this has allowed us to create R tools that allow the retrieval and analysis of official statistics from many countries across Europe, spanning dozens of statistical institutions. Standardization of open data formats allows us to build robust algorithmic tools for downstream data analysis and visualization. Open government data is still too often shared in obscure, non-standard or closed-source file formats and this is creating significant bottlenecks for the development of scalable and interoperable AI and machine learning methods that can harness the full potential of open data.

Regarding open government data, one of my favorites is not a single data source but a data representation standard, the Px format.

From your perspective, what do you see being the greatest problem with open data in 2021?

Although there are a variety of open data sources available (and the numbers continue to increase), the availability of open algorithmic tools to interpret and communicate open data efficiently is lagging behind. One of the greatest challenges for open data in 2021 is to demonstrate how we can maximize the potential of open data by designing smart tools for open data analytics.

What can our automated data observatories do to make open data more credible in the European economic policy community and be accepted as verified information?

The role of the professional network backing up the project, and the possibility of getting critical feedback and later adoption by the academic communities will support the efforts. Transparency of the data harmonization operations is the key to credibility, and will be further supported by concrete benchmarks that highlight the critical differences in drawing conclusions based on original sources versus the harmonized high-quality data sets.

We need to get critical feedback and later adoption by the academic communities.

How we can ensure the long-term sustainability of the efforts?

The extent of open data space is such that no single individual or institution can address all the emerging needs in this area. The open developer networks play a huge role in the development of algorithmic methods, and strong communities have developed around specific open data analytical environments such as R, Python, and Julia. These communities support networked collaboration and provide services such as software peer review. The long-term sustainability will depend on the support that such developer communities can receive, both from individual contributors as well as from institutions and governments.

Join us

Join our open collaboration Economy Data Observatory team as a data curator, developer or business developer. More interested in environmental impact analysis? Try our Green Deal Data Observatory team! Or your interest lies more in data governance, trustworthy AI and other digital market problems? Check out our Digital Music Observatory team!

R | Daniel Antal

Create Datasets that are Easy to Combine and Reuse

Learn R with Reprex

Big Data Creates Inequalities

Slide navigation

Big data that works for all

Get Inspired

Contributor Covenant

Run code from tutorials

Find help, ask for help: reprex

Documentation for better tutorials

Debugging and testing code

Contribute to documentation

R is a functional language

R + YAML + markdown = web ready

Package and release: a team effort

Our open source development projects

Create with us

Questions?

Learn R with Reprex

Big Data Creates Inequalities

Slide navigation

Big data that works for all

Data Feminism

Get Inspired

Contributor Covenant

Run code from tutorials

Find help, ask for help: reprex

Documentation for better tutorials

Debugging and testing code

Contribute to documentation

R is a functional language

R + YAML + markdown = web ready

Package and release: a team effort

Our open source development projects

Create with us

Questions?

Creating Algorithmic Tools to Interpret and Communicate Open Data Efficiently

Join us