open-data | Antal Dániel honlapja

Open Music Registers

Tue, 29 Apr 2025 00:00:00 +0000

About this Release

This technical paper is part of the Open Music Observatory under the Horizon Europe Open Music Europe project.
It presents an early framework for federated music registers and demonstrates how they can support rights management, cultural statistics, and business innovation.

The current edition describes the design principles and pilot implementations.
Future editions will extend the model with more data partners, stress-tested pipelines, and additional use cases.

Note: This is a technical release and should be cited using the DOI: 10.5281/zenodo.14767717.

Participate

We invite music industry partners, cultural institutions, and researchers to engage with the pilot registers and help refine the model.
Please visit the Zenodo record or the Open Music Observatory for more information.

Big Data for All: Building Collaborative Data Observatories

Thu, 03 Nov 2022 17:30:00 +0000

Reprex’s co-founder, Daniel Antal talked in the Eindhoven Innovation Café about these issues. You can watch the recorded version of the the livestream that starts at 5 minutes and 22 seconds:

This is a past event. Check out our forthcoming events or write to Daniel Antal or to antaldaniel. Or send an email.

The event invitation text and links

Big data and AI creates inequalities. It puts historically marginalized people, like ethnic minorities, and womxn, at a disadvantage. Because AI and checking on AI require plenty of data, usually only giant corporations, the wealthiest governments, and university entities can make it work for them. Reprex is a Hague-based, international startup that wants to impact various sustainable development goals by enabling smaller organizations to join their smaller datasets, use open data, create linked available data, and collaboratively make a change.

Reprex is a finalist for the Hague Innovation Award for impact startup (please 🙏, vote for us!). Daniel Antal, one of the co-founders, will talk about their approach to building an international coalition of music organizations to pool data and challenge data monopolies using organizational techniques, a collaboration ethos, and data from the open-source developer world.

Using the example of independent music creators, who often find themselves in a position where it is more expensive to claim their money from global platforms, he will talk about how to reduce inequalities in the world of big data and AI with collaboration on web 3.0. In the Q&A he will take questions on how to apply their know-how, and generally linked open data to other art+tech or creative segments or problems for which everybody is too small, like meeting the Paris Accord greenhouse gas targets bit by bit, small company by small company.

In the Q&A, we can discuss many things

How can Reprex help an individual creator in music, or in fashion and design, or any other area?
What sort of help it can give to researchers, research institutes, specialist consultancies, law firms, and other knowledge-based actors?

What sort of partners is Reprex looking for in Eindhoven?

Check out our projects

Digital Music Observatory and Listen Local
Cultural & Creative Sectors and Industries Observatory and short call for potential partners.
Green Deal Data Observatory and simple, connected, financial and sustainability reporting for creative enterprises and others

Reprex: the impact startup

Check out our accomplishments since the foundation in 2020

Digital Music Observatory on the MaMA Convention 2021, Paris, FR

Thu, 14 Oct 2021 11:00:00 +0000

Currently more than half of the global music sales are made by autonomous AI systems owned by Google, Apple, or Spotify. These data monopolies are getting rich, because they reap the profit from music businesses with an average employee count of 1.8 Europe. European music businesses are easy to exploit with armies of data engineers and data scientists because they do not have a single data scientist or even an IT function.

Artists in the UK had a difficulty explaining in Westminster how they are losing out in streaming– so we have created a streaming price index, like the Dow Jones, if you like, that explains the economic factors of the devaluation of music in the last 5 years in 20 countries. (See our report.)
Music organizations in Slovakia and Hungary were frustrated that their politicians and journalists believed music to be taxpayer funded, so we showed with data that they contribute more proportionally to the national budget than car manufacturers, the darling of local politicians (See our reports in Hungary (recast several times) and in Slovakia.)
We successfully challenged with data restaurant associations, hotel chains, telecom corporations and broadcasters who wanted to bring music prices down in court and via lobbying.

The music industry has envied the television and film industry which has a single go-to-point for data when it needs them, the European Audiovisual Observatory. It started lobbying for a publicly financed music observatory. But we did not wait. The music industry has a tragic track record of failed centralized international data projects. We built Reprex out of a 12-country, decentralized music project. We learned how to utilize hidden, but already existing data and research funds well, and how to manage the data governance among the poisonous conflicts of interests between rich and poor countries, authors vs producers, producer’s vs performers.

Our Digital Music Observatory is not theoretical, it is practical, because it is built around real-life court cases, damage claims, lobbying and PR arguments.
Our Digital Music Observatory is comprehensive – it contains more than a thousand indicators from all European countries. We have enough data to test the biases of the Spotify or the YouTube algorithm – you would be surprised what the data tells us.
It has data available much sooner, in much higher quality and in a more practical format than in the Audiovisual one.

Presentation Slides

You can see the presentation slides here.

Economic and Environment Impact Analysis, Automated for Data-as-Service

Thu, 03 Jun 2021 16:00:00 +0000

We have released a new version of iotables as part of the rOpenGov project. The package, as the name suggests, works with European symmetric input-output tables (SIOTs). SIOTs are among the most complex governmental statistical products. They show how each country’s 64 agricultural, industrial, service, and sometimes household sectors relate to each other. They are estimated from various components of the GDP, tax collection, at least every five years.

SIOTs offer great value to policy-makers and analysts to make more than educated guesses on how a million euros, pounds or Czech korunas spent on a certain sector will impact other sectors of the economy, employment or GDP. What happens when a bank starts to give new loans and advertise them? How is an increase in economic activity going to affect the amount of wages paid and and where will consumers most likely spend their wages? As the national economies begin to reopen after COVID-19 pandemic lockdowns, is to utilize SIOTs to calculate direct and indirect employment effects or value added effects of government grant programs to sectors such as cultural and creative industries or actors such as venues for performing arts, movie theaters, bars and restaurants.

Making such calculations requires a bit of matrix algebra, and understanding of input-output economics, direct, indirect effects, and multipliers. Economists, grant designers, policy makers have those skills, but until now, such calculations were either made in cumbersome Excel sheets, or proprietary software, as the key to these calculations is to keep vectors and matrices, which have at least one dimension of 64, perfectly aligned. We made this process reproducible with iotables and eurostat on rOpenGov

Our iotables package creates direct, indirect effects and multipliers programatically. Our observatory will make those indicators available for all European countries.

Accessing and tidying the data programmatically

The iotables package is in a way an extension to the eurostat R package, which provides a programmatic access to the Eurostat data warehouse. The reason for releasing a new package is that working with SIOTs requires plenty of meticulous data wrangling based on various metadata sources, apart from actually accessing the data itself. When working with matrix equations, the bar is higher than with tidy data. Not only your rows and columns must match, but their ordering must strictly conform the quadrants of the a matrix system, including the connecting trade or tax matrices.

When you download a country’s SIOT table, you receive a long form data frame, a very-very long one, which contains the matrix values and their labels like this:

## Table naio_10_cp1700 cached at C:\Users\...\Temp\RtmpGQF4gr/eurostat/naio_10_cp1700_date_code_FF.rds
# we save it for further reference here
saveRDS(naio_10_cp1700, "not_included/naio_10_cp1700_date_code_FF.rds")
# should you need to retrieve the large tempfiles, they are in
dir (file.path(tempdir(), "eurostat"))
dplyr::slice_head(naio_10_cp1700, n: 5)
## # A tibble: 5 x 7
## unit stk_flow induse prod_na geo time values
## <chr> <chr> <chr> <chr> <chr> <date> <dbl>
## 1 MIO_EUR DOM CPA_A01 B1G EA19 2019-01-01 141873.
## 2 MIO_EUR DOM CPA_A01 B1G EU27_2020 2019-01-01 174976.
## 3 MIO_EUR DOM CPA_A01 B1G EU28 2019-01-01 187814.
## 4 MIO_EUR DOM CPA_A01 B2A3G EA19 2019-01-01 0
## 5 MIO_EUR DOM CPA_A01 B2A3G EU27_2020 2019-01-01 0

The metadata reads like this: the units are in millions of euros, we are analyzing domestic flows, and the national account items B1-B2 for the industry A01. The information of a 64x64 matrix (the SIOT) and its connecting matrices, such as taxes, or employment, or C**O₂ emissions, must be placed exactly in one correct ordering of columns and rows. Every single data wrangling error will usually lead in an error (the matrix equation has no solution), or, what is worse, in a very difficult to trace algebraic error. Our package not only labels this data meaningfully, but creates very tidy data frames that contain each necessary matrix of vector with a key column.

iotables package contains the vocabularies (abbreviations and human readable labels) of three statistical vocabularies: the so called COICOP product codes, the NACE industry codes, and the vocabulary of the ESA2010 definition of national accounts (which is the government equivalent of corporate accounting).

Our package currently solves all equations for direct, indirect effects, multipliers and inter-industry linkages. Backward linkages show what happens with the suppliers of an industry, such as catering or advertising in the case of music festivals, if the festivals reopen. The forward linkages show how much extra demand this creates for connecting services that treat festivals as a ‘supplier’, such as cultural tourism.

Let’s seen an example

## Downloading employment data from the Eurostat database.
## Table lfsq_egan22d cached at C:\Users\...\Temp\RtmpGQF4gr/eurostat/lfsq_egan22d_date_code_FF.rds

and match it with the latest structural information on from the Symmetric input-output table at basic prices (product by product) Eurostat product. A quick look at the Eurostat website already shows that there is a lot of work ahead to make the data look like an actual Symmetric input-output table. Download it with iotable_get() which does basic labelling and preprocessing on the raw Eurostat files. Because of the size of the unfiltered dataset on Eurostat, the following code may take several minutes to run.

sk_io <- iotable_get ( labelled_io_data: NULL,
source: "naio_10_cp1700", geo: "SK",
year: 2015, unit: "MIO_EUR",
stk_flow: "TOTAL",
labelling: "iotables" )
## Reading cache file C:\Users\..\Temp\RtmpGQF4gr/eurostat/naio_10_cp1700_date_code_FF.rds
## Table naio_10_cp1700 read from cache file: C:\Users\..\Temp\RtmpGQF4gr/eurostat/naio_10_cp1700_date_code_FF.rds
## Saving 808 input-output tables into the temporary directory
## C:\Users\...\Temp\RtmpGQF4gr
## Saved the raw data of this table type in temporary directory C:\Users\...\Temp\RtmpGQF4gr/naio_10_cp1700.rds.

The input_coefficient_matrix_create() creates the input coefficient matrix, which is used for most of the analytical functions.

a_i**j: X_i**j / x_j

It checks the correct ordering of columns, and furthermore it fills up 0 values with 0.000001 to avoid division with zero.

input_coeff_matrix_sk <- input_coefficient_matrix_create(
data_table: sk_io
)
## Columns and rows of real_estate_imputed_a, extraterriorial_organizations are all zeros and will be removed.

Then you can create the Leontieff-inverse, which contains all the structural information about the relationships of 64x64 sectors of the chosen country, in this case, Slovakia, ready for the main equations of input-output economics.

I_sk <- leontieff_inverse_create(input_coeff_matrix_sk)

And take out the primary inputs:

primary_inputs_sk <- coefficient_matrix_create(
data_table: sk_io,
total: 'output',
return: 'primary_inputs')
## Columns and rows of real_estate_imputed_a, extraterriorial_organizations are all zeros and will be removed.

Now let’s see if there the government tries to stimulate the economy in three sectors, agricultulre, car manufacturing, and R&D with a billion euros. Direct effects measure the initial, direct impact of the change in demand and supply for a product. When production goes up, it will create demand in all supply industries (backward linkages) and create opportunities in the industries that use the product themselves (forward linkages.)

direct_effects_create( primary_inputs_sk, I_sk ) %>%
select ( all_of(c("iotables_row", "agriculture",
"motor_vechicles", "research_development"))) %>%
filter (.data$iotables_row %in% c("gva_effect", "wages_salaries_effect",
"imports_effect", "output_effect"))
## iotables_row agriculture motor_vechicles research_development
## 1 imports_effect 1.3684350 2.3028203 0.9764921
## 2 wages_salaries_effect 0.2713804 0.3183523 0.3828014
## 3 gva_effect 0.9669621 0.9790771 0.9669467
## 4 output_effect 2.2876287 3.9840251 2.2579634

Car manufacturing requires much imported components, so each extra demand will create a large importing activity. The R&D will create a the most local wages (and supports most jobs) because research is job-intensive. As we can see, the effect on imports, wages, gross value added (which will end up in the GDP) and output changes are very different in these three sectors.

This is not the total effect, because some of the increased production will translate into income, which in turn will be used to create further demand in all parts of the domestic economy. The total effect is characterized by multipliers.

Then solve for the multipliers:

multipliers_sk <- input_multipliers_create(
primary_inputs_sk %>%
filter (.data$iotables_row == "gva"), I_sk )

And select a few industries:

set.seed(12)
multipliers_sk %>%
tidyr::pivot_longer ( -all_of("iotables_row"),
names_to: "industry",
values_to: "GVA_multiplier") %>%
select (-all_of("iotables_row")) %>%
arrange( -.data$GVA_multiplier) %>%
dplyr::sample_n(8)
## # A tibble: 8 x 2
## industry GVA_multiplier
## <chr> <dbl>
## 1 motor_vechicles 7.81
## 2 wood_products 2.27
## 3 mineral_products 2.83
## 4 human_health 1.53
## 5 post_courier 2.23
## 6 sewage 1.82
## 7 basic_metals 4.16
## 8 real_estate_services_b 1.48

Vignettes

The Germany 1990 provides an introduction of input-output economics and re-creates the examples of the Eurostat Manual of Supply, Use and Input-Output Tables, by Jörg Beutel (Eurostat Manual).

The United Kingdom Input-Output Analytical Tables Daniel Antal, based on the work edited by Richard Wild is a use case on how to correctly import data from outside Eurostat (i.e. not with eurostat::get_eurostat()) and join it properly to a SIOT. We also used this example to create unit tests of our functions from a published, official government statistical release.

Finally, Working With Eurostat Data is a detailed use case of working with all the current functionalities of the package by comparing two economies, Czechia and Slovakia and guides you through a lot more examples than this short blogpost.

Our package was originally developed to calculate GVA and employment effects for the Slovak music industry (see our Slovak Music Industry Report), and similar calculations for the Hungarian film tax shelter. We can now programatically create reproducible multipliers for all European economies in the Digital Music Observatory, and create further indicators for economic policy making in the Economy Data Observatory.

Environmental Impact Analysis

Our package allows the calculation of various economic policy scenarios, such as changing the VAT on meat or effects of re-opening music festivals on aggregate demand, GDP, tax revenues, or employment. But what about the C**O₂, methane and other greenhouse gas effects of the reopening festivals, or the increasing meat prices?

Technically our package can already calculate such effects, but to do so, you have to carefully match further statistical vocabulary items used by the European Environmental Agency about air pollutants and greenhouse gases.

The last released version of iotables is Importing and Manipulating Symmetric Input-Output Tables (Version 0.4.4). Zenodo. https://doi.org/10.5281/zenodo.4897472, but we are alread working on a new major release. In that release, we are planning to build in the necessary vocabulary into the metadata functions to increase the functionality of the package, and create new indicators for our Green Deal Data Observatory. This experimental data observatory is creating new, high quality statistical indicators from open governmental and open science data sources that has not seen the daylight yet.

rOpenGov and the EU Datathon Challenges

rOpenGov, Reprex, and other open collaboration partners teamed up to build on our expertise of open source statistical software development further: we want to create a technologically and financially feasible data-as-service to put our reproducible research products into wider user for the business analyst, scientific researcher and evidence-based policy design communities.

rOpenGov is a community of open governmental data and statistics developers with many packages that make programmatic access and work with open data possible in the R language. Reprex is a Dutch-startup that teamed up with rOpenGov and other open collaboration partners to create a technologically and financially feasible service to exploit reproducible research products for the wider business, scientific and evidence-based policy design community. Open data is a legal concept - it means that you have the rigth to reuse the data, but often the reuse requires significant programming and statistical know-how. We entered into the annual EU Datathon competition in all three challenges with our applications to not only provide open-source software, but daily updated, validated, documented, high-quality statistical indicators as open data in an open database. Our iotables package is one of our many open-source building blocks to make open data more accessible to all.

Join our open collaboration Digital Music Observatory team as a data curator, developer or business developer. More interested in environmental impact analysis? Try our Green Deal Data Observatory team! Or economic policies, particularly computation antitrust, innovation and small enterprises? Check out our Economy Music Observatory team!

Reprex Open Data Day 2021

Sat, 06 Mar 2021 15:30:00 +0200

Open Data Day is an annual celebration of open data all over the world. It is an opportunity to show the benefits of open data and encourage the adoption of open data policies in government, business, and civil society. Reprex is a start-up that utilizes open data with open-source reproducible research: please challenge us with your data requests and participate in our web events.

The Reprex Open Data Day 2021 will be two informal conversations based on a series of run up introductory blogposts centered around two themes. Because important guests became ill in the last days, we are going to consolidate the two talks into one with less structure. We want to create an informal, inclusive, collaborative online event on International Open Data Day 2021. Please, grab a tea, coffee, or even a beer, and join us for an informal conversation. We hope that we will finish the afternoon with ideas on new, open-data driven collaborations.

9.30 EST / 15.30 CET: Open collaboration in business, policy and science. Creating evidence-based policy, business strategy or scientific research with small contributions with independent components with incentives. Short introduction with examples: joining environmental sensory data and public opinion data on maps; creating harmonized datasets across the Arab world. Survey harmonization, mapping, data products. Scaling up open collaboration: making small organizations competitive with big tech in the big data era. Data sharing, data pooling, data altruism and observatories. The new European trustworthy AI and data governance agenda.

You can click through a short presentation to familiarize yourself with our topics.

See you here.

Case studies:

We are connecting raw survey data about Climate Awareness in Eurobarometer surveys. Here is the reproduction code (intermediate to advanced R needed.) You should use the development version of our retroharmonize package at github.com/antaldaniel/retroharmonize
We are tracking changes in the boundaries of provinces, states, counties, parishes with our regions open source software – reproduction code here. You will need our regions package which is available on CRAN or in the rOpenGov GitHub repo.
We will talk about how to join this with air pollution data and put it on the map with Milos Popovic, who prepared this nice choropleth animation.

We will discuss data observatories (permanent data collection programs), open collaboration (open-source inspired way of cooperation among small and large independent actors) and data altruism.

Any questions: send Daniel a message on Keybase, Whatsapp or email.

Reprex introduction in IVIR, Amsterdam, NL

Tue, 02 Feb 2021 10:10:00 +0000

IViRtual 9 April 2021

Product/Market Fit Validation in Yes!Delft

Fri, 25 Sep 2020 15:31:39 +0000

We would like to validate our product market/fit in two segments, business/policy research and scientific research, with a supporting role given to data journalism. Because we want to follow a bootstrapping strategy, we must focus on those clients where we find the highest value proposition, which is of course easier said than done. We see much interest in our offering from other continents, therefore we truly welcome the opportunity that we can do this on a truly global business canvas in one of the worlds’ top five incubators, the number 2 university-backed incubator in the world, second to none in Europe, in the Yes!Delft AI+Blockchain Validation Lab.

In Europe hundreds of thousands of microenterprises, such as record labels, video producers or book publishers are facing data and AI giants like Google’s YouTube, Apple Music, Spotify, Netflix or Amazon. If the recommendation engines of these giants do not recommend their songs, films or books, then their investments are doomed to fail, because about half of the global sales are driven by AI algorithms. When they make a claim for the missing money, they will immediately find themselves in a dispute with gigabytes of data that they can only handle with a data scientist, even though they do not even have an IT professional or an HR professional to make the hire.

An awful lot of money, creativity and real values are at stake, and we want to be on the creator’s side, their technician’s side, their manager’s side when they want to get a fair share from the pie and they want to help these industry leader to make the pie grow.

The UNESCO and the EU have been promoting as an organizational solution the fragmentation problem with the so-called data observatories that are pooling the business, policy, and scientific research needs of various domains, like music. This is an idea that we really like, and we believe that our research automation solutions can help these observatories to grow faster as ecosystems, create better quality and more timely data and research products and a far lower cost.

We define ourselves as a reproducible research company inspired by the philosophy of open collaboration, based on open-source software and open data. We want to explore various revenue models around these ideas.

We are not committed to open source licensing if more permissive licensing policies provide us with better opportunities.
We would like to explore various data-as-service models, because we do not want to be locked into the position of cheap open data vendors.
We want to deploy AI applications that really help earning money in these sectors with playlisting, recommendation engines, forecasting applications, or royalty valuations, because our open collaboration approach brings up enough data sooner to than its alternatives, because it manages inherent conflicts of interests, fragmentation, and decentralization better than hierarchical solutions.

Timeline

In January CEEMID reached its peak: we introduced a 12-country reproducible research project made with only freelancers in Brussels, presented as best use case of evidence-based policy design.
In February Daniel visited the Yes!Delft Co-Lab to find out who would be the best co-founder to re-launch CEEMID as an enterprise.
In April we started to release our data as open data for validation.
One month ago we started-up.
Then we launched the music.dataobservatory.eu project.
A few other data observatories.

Bonus:

Palato in the Hague, where we took our selfie and had an absolutely amazing dinner after the pitch. Check them out!

Reproducible Survey Harmonization: retroharmonize Is Released

Mon, 21 Sep 2020 11:31:39 +0000

Our original intention was to make surveying more accessible for music and creative industry partners, by relying more on already existing survey data, and better designing complementary, smaller surveys, becasue surveying, opinion polling is becoming increasingly expensive in the develop world. People are less and less likely to sit down for an interview in their houses. We have tried to harmonize our custom surveys, particuarly with Kantar in Hungary and Focus in Slovakia with exisiting EU projects. But we ended up making a part of international survey harmonization across countries and throughout years easier to automate.

Surveys are like sensors for natural sciences and industrial production. They are essential for almost any social and economic statistical indicator, for calculating the inflation, parts of the GDP, participation in education programs. Making surveys easier to harmonize and exploit more already existing survey data can bring down research cost, and can increase research value at the same time. (See our earlier blog post Increase The Value Of Market Research With Open Data And Survey Harmonization.)

So, if you are an R user, you can use install.packages(“retroharmonize”) to get the released 0.1.13 version and make tutorials with real Eurobarometer or Afrobarometer microdata. With devtools::install_github("antaldaniel/retroharmonize") you can already install the current development version 0.1.14, which handles perl-like regex, which will be necessary for our next tutorial in the making for Arab Barometer.

Related:

Launching Our Demo Music Observatory

Tue, 15 Sep 2020 08:00:39 +0000

Today, on 15 September 2020, we officially launched our minimal viable product as we promised to partners back in February. This was a particularly difficult period for everybody. We aspired to deliver by September in a very different environment, our hopes for commissioned work went up in flames with the pandemic, and our targeted users, musicians and music entrepreneurs, talent managers, music venues lost most of their income. The organizations helping them, granting authorities, export offices and collective management societies are overwhelmed with the problem. During these troublesome times, our team expanded, attracted great new talent, and kept working.

Our first product is the Demo Music Observatory, a collaborative, automated research-based observatory for the music industry, one that is particularly hard hit by the COVID19 crisis. Not only great artists, composers, technicians, managers fell victim to the virus, but musicians lost about 50–90% of their income from live music. This translates to a 100% loss for the live music technicians and managers.

See our earlier blogpost on what you see on the video.

The music industry was never a place for great job security. For putting up a show, you usually need a network of 10–200 artists, technicians and managers to work together as freelancers without all those social benefits that many people enjoy in other walks of life. We have been trying to figure out how to help this microenterprise and freelancer-network based industry with research for five years. Our aim is to make them competitive when they are talking with their buyers: Google, Apple, Spotify, who are really heavy-weight data and AI pros. Our better plan their tours, when they will be back on the road, to understand what sort of audiences and purchasing power waits for them in different European cities.

We are launching at a time when the music industry is crying for help.Therefore, we have decided to make our demo observatory open and unfinished. Over the last 7 years, we have built up about 2000 music and creative sector indicators to be used for business KPIs, forecasting targets, grant evaluations, royalty valuations, concert demography target group analysis and other professional uses. We would like to open up, based on your needs, about 50 well-designed indicators, and pledge to keep it daily refreshed, corrected, documented, citaable, downloadable. Also, feel free to use our most valuable source code—use it for your own purposes, even modify it, as long as you keep it open.

For our smaller partners, we follow what musicians do these days on Bandcamp: name your price. We make a pledge to our small partners: if you need reliable data to plan your next grant calls, calculate royalties, compensations, predict hit candidates, give us the job—and name your price. Post-corona, you can take for a dollar the best music from Bandcamp. You can take our research products, for a limited period, for any amount you name, as long as it is for a good cause and serves the industry, musicians, technicians or managers. In return, we ask for your feedback. Help us validate whether we are on the right track, tell us how we can cooperate after the pandemic, in better times.

Our larger and better funded partners? We ask you to pay the price we name, because we believe that it is a well-justified, fair and competitive price, set by pricing experts.

We appreciate it if you take a look at our offering, or if you pass this blogpost on to your colleagues in the industry. Our main target audience initially are music professional in broader Europe, but we are planning to cover all major global markets very soon, too. Feedback from the U.S., Australia, Canada, Colombia, Brazil & Argentina is particularly welcome as we have great plans over there!

Who we are?

We started our operations on 1 September 2020 on the basis of CEEMID, a pan-European data observatory that created about 2000 music and creative industry indicators for its users. In the coming days, we are gradually opening up about 50 music industry and 50 broader creative industry indicators in a fully reproducible workflow, with daily re-freshed, re-processed, well-formatted and documented indicators for business and policy decisions.

We would like to validate this approach in one of the world’s most prestigious university-backed incubator programs, in the Yes!Delft AI/Blockchain Validation Lab. We’re finalist on their selection, and all help before 23 September from our friends in the music industry is more than appreciated. If we get there, we can rely on probably the best pros in Europe to make our offering better tailored and financially sustainable.

Get in touch!

We use the very simple and extremely secure keybase.io, a kind of mix of Whatsapp, Skype, Google Drive, One Drive and zoom. You can get in touch on that platform with us in anytime here.

You can easily contact on LinkedIn Daniel or Kátya and of course, we have a usually working email contact form, too. Our email is name.surname at our main domain.

Video credits

Data acquisition and processing: Daniel Antal, CFA and Marta Kołczyńska, PhD (survey data).
Documentation automation: Sandor Budai
Video art: Line Matson
Music: Moon Moon Moon.

Creating An Automated Data Observatory

Fri, 11 Sep 2020 16:00:39 +0000

We are building data ecosystems, so called observatories, where scientific, business, policy and civic users can find factual information, data, evidence for their domain. Our open source, open data, open collaboration approach allows to connect various open and proprietary data sources, and our reproducible research workflows allow us to automate data collection, processing, publication, documentation and presentation.

Our scripts are checking data sources, such as Eurostat’s Eurobase, Spotify’s API and other music industry sources every day for new information, and process any data corrections or new disclosure, interpolate, backcast or forecast missing values, make currency translations and unit conversions. This is shown illustrated with an earlier post.

For direct access to the file visit this link.

In the video we show automated the creation of an observatory website with well-formatted, statistical data dissemination, a technical document in PDF and an ebook can be automated. In our view, our technology is particularly useful technology in business and scientific researech projects, where it is important that always the most timely and correct data is being analyzed, and remains automatically documented and cited. We are ready deploy public, collaborative, or private data observatories in short time.

Data processing costs can be as high as 80% for any in-house AI deployment project. We work mainly with organization that do not have in house data science team, and acquire their data anyway from outside the organization. In their case, this rate can be as high as 95%, meaning that getting and processing the data for deploying AI can be 20x more expensive than the AI solution itself.

AI solutions require a large amount of standardized, well processed data to learn from. We want to radically decrease the cost of data acquisition and processing for our users so that exploiting AI becomes in their reach. This is particularly important in one of our target industries, the music industries, where most of the global sales is algorithmic and AI-driven. Artists, bands, small labels, publishers, even small country national associations cannot remain competitive if they cannot participate in this technological revolution.

We would like to validate this approach in one of the world’s most prestigious university-backed incubator programs, in the Yes!Delft AI/Blockchain Validation Lab.

Video credits

Data acquisition and processing: Daniel Antal, CFA and Marta Kołczyńska, PhD (survey data).
Documentation automation: Sandor Budai
Video art: Line Matson
Music: Moon Moon Moon.

Starting-up

Mon, 24 Aug 2020 10:15:00 +0000

The big day has come: the co-founders singed off the documents at the public notary and started the registration of a reproducible research start-up in Leiden. We got a lot of support from our friends! Your encouragement gives us a lot of energy to accomplish our first milestones, and to get Reprex B.V. going!

Reprex means ‘reproducible example’ in data science. When you are stuck with a problem, creating a reproducible example allows other computer scientists, statisticians, programmers or data users to solve it. In 80% of the cases, you usually find the solution while creating a generalized example. In the 20% other cases, you can reach out for help easily.

In the coming days, we are launching demo versions of our headline products, data observatories. music.dataobservatory.eu will be a fully automated online service that every day collects, processes, cleans, and publishes scientifically valid data about European music. Very soon after we will launch two other observatories.

The creative and cultural sector, NGOs, most research institutions, data journalism teams are usually very small, and they do not have internal IT or data science capacities. We would like to provide them a transparent, high quality, and fully open source solution to acquire data, process it without errors, document it and make sense of it. We would like to embrace the idea of open collaboration among creative enterprises, scientific researchers, NGOs, data journalists and policymakers with our work.

Our work will comply with the Open Policy Analysis standards developed by the Berkeley Initiative for Transparency in the Social Sciences & Center for Effective Global Action and the four principles of reproducible research: reviewability, replicability, confirmability and auditability. We believe that these standards apply in reproducible finance, empirical evidence presentation in courts, or advocating sound policies and producing high-quality journalism.

Do you want to help our start?

We would like to enter into the Validation Lab of one of the best artificial intelligence incubators in early September. Talented team members, letters of intents and assignments from organizations will give a lot of credibility to our start Meet our team ».

Put as in contact with people who love to write code in R and interested in automating business and social science research and primary data collection such as surveying. Check out what sort of code we create »
Introduce us to people who need data and information to make better informed decision and analysis in music, film, book publishing, photography services or socially responsible finance.
Share contacts of data journalists who would like to develop stories from big survey programs like Eurobarometer, Afrobarometer and Lationbarometro, or base their storytelling on data and its visualizations. See our survey harmonization examples »

Do you know such people? Send over this post or connect us in an email or social media message!

Thanks again for your good wishes and encouragements, and hope to hear from you soon!