HR Data Science: Salary Prediction at Trivago and StepStone

Dear guest!

On Tuesday evening, 29th of August 2023, I was at the Düsseldorf Data Science Meetup at Trivago, the Online search company for hotels, in their extraordinary headquarters in the Medienhafen of Düsseldorf, my birth city.

I left my office in Solingen early in the afternoon that day since I met with my business friend Dominik Rühl before the event – and walked in the sun from the Düsseldorfer Landtag (state parliament) at the Rhine to our meeting point at UCI cinema near Trivago.

Dominik Rühl (right) with me at the meetup before the talks. Foto: Stefan Klemens (Thanks to him for granting publishing)

It was good to see Dominik after about five years – And we had a fruitful exchange on our common topics artificial intelligence (AI), recruitment, skills, and digital assessment as well as some private issues. He is now working as a HR & Recruiting Manager at Advance Business Partner GmbH based nearby in the city of Neuss on the other side of the Rhine. The consulting company focuses on mobility services in different areas like recruitment, innovation, and transformation management.

Although the summer and weather this year in Germany is pretty unstable, we enjoyed sitting outside with our drinks at unique brewery bar Eigelstein.

Find out more about the Düsseldorf Data Science Meetup Group with its interests in Data Science, Machine Learning and Python/R, on this website.

Arriving at Trivago

The Trivago building as seen from the north-east of the Medienhafen Düsseldorf. Foto: Stefan Klemens

At 6 pm it was time to walk to nearby Trivago building, finished in 2018. The individual modern styled entrance area and the café behind offers a glimpse on how the interior of the building is decorated (see this article and this article about the New Work culture at Trivago and the architecture of the headquarter´s spaces.)

Surprisingly we, with another guest, were the first participants arriving (ok, it was half our before the official start and talks started even later), but were soon picked up by Gina from Trivago. Together we (and a cart full of pizza in yellow boxes for the data people) were lifted by one of the elevators to the top floor for the location event.

A stunning view to the south-west skyline from the roof terrace reached our eyes, and Dominik, the coming participants, and me enjoyed drinks and pizza before the event started at 7 pm.

Our co-host Aida Orujova gave us a very warm welcome, she introduced the speakers, and broke the ice by asking who is from data science, who is from engineering, and who just there to know more about salaries.

Co-host and moderator of the evening Aida Orujova welcoming the Data Science crowd. Foto: Stefan Klemens (with approval of the her)

First talk: Alexander Fischer, Trivago

Alexander Fischer from Trivago started with his talk about is passion for the programming and statistics software R, and his (and the economists´) “Swiss knife” methodical approach for prediction outcome variables: Linear Regression. He showed how he and his team used this classical algorithm with packages R´s fixest, and PyFixest to predict wage by using the variables education and ability (e.g. intelligence).

In his presentation of the problem in doing that (“The error term is correlated with the dependent variable”) he referred to a recent study using data from 59,000 Swedish men published 2023 by Marc Keuschnigg, Arnout van de Rijt, and Thijs Bol in the European Sociological Review (number 20, pages 1-14), titled “The plateauing of cognitive ability among top earners” (online article published here on January 28, 2023).

Since AB-Testing (or randomized experimental and control group design) is not feasible in the model (sending randomized individuals in one group for example one year more to college) the classical solution in Social Sciences and Psychology are Quasi-Experiments which were first introduced in the literature by standard book “Quasi-Experimentation: Design and Analysis Issues for Field Settings” written by Cook and Campell (1979).

As a solution for not manipulation experimental the years of education as predictor of the wage Alexander used therefore a variable called “distance to college” as a natural differing factor between people regarding their years of education.

The data scientist from Trivago further pointed out in his “The Secret Sauce” slide that taking the role of companies into account in the corresponding regression model, the computation is quite demanding (millions of employees, thousands of companies, 20 years of data) – But he presented of course a solution for it (and that was not Spark!).

At the end with the help of programming language Python and package PyFixest Alexander showed that the prediction of salary can be done, and he answered the questions of the audience.

Second talk: Michael Matuschek & Tim Elfrink, StepStone

In the second talk this evening we learned from Michael Matuschek and Tim Elfrink how StepStone is predicting the salaries of all kinds of jobs for their salary products.

Michael begun the session, and gave an overview about StepStone´s salary products include the Salary Planer, Salary on Listings, and Auto-generated Salary SEO pages.

As a result of a 2020 study and further research before it turned out that salary is for 96 % of the respondents the most important criteria when choosing a job (flexible working ours, career & training opportunities, and corporate culture, reach only 90 % resp. 91 %).

Michael Matuschek with Tim Elfrink from StepStone answering a question from the audience. Foto. Stefan Klemens. Thanks both for their approval of the picture!

Michael told us also about the challenges in prediction salary at StepStone regarding data distribution and features (more white collar jobs and little part-time data for example) and: The gender pay gap, quality assurance, feature engineering, the underlying model and the used algorithm, as well as the metrics (main business KPIs) accuracy and generalisation.

After him Tim Elfrink took the mic and explained the broader infrastructure of the predicting IT system with AWS and the auto deployment of the model. Further subtitles of his presentation were for example: Creating scalable infrastructure and development environment.

A number of questions (and some hints for improving their model) came from the participants, and Michael and Tim were happily answering them.

Closing, socks, and outlook

At 8.30 pm presenter Aida Orujova returned to the stage again and thanked all guests and speakers for being there. As several others I took the chance to talk with some participants (see header picture), before I needed to catch my tram to travel home.

Trivago-Logo in front of the building after sunset. Foto: Stefan Klemens

My second Düsseldorf Data Science Meetup was another wonderful experience (read about my first here), and the scheduled next event in October 2023 is of course on my list.

Oh, one last thing (we learned this from the apple guy, right?) I did not mention yet. Before the start the participants could grab one, two, or three promotional gifts from Trivago as shown in the picture: One for using your hand to write (still common among a few people I was told), one for storing big data in a small piece of metal, and one to keep your feet between 28 ° C and 33 ° C (surface temperature of the extremities as I learned writing this sentence) when external temperatures fall in later autumn.

Promotional gifts for the participants of the Düsseldorf Data Science Meetup from Trivago. Foto: Stefan Klemens

As I like to test digital and analogue things (I have high scores on openness to experience (see the Big Five Personality Traits) and curiosity which is one of my signature strengths according to the VIA-Model), the usefulness of the trivagonian socks to prevent cold toes needed to be proven also.

Note: If you like to know more about psychological traits and psychometric assessment of these for HR recruiting, selection, and development, then click on my work as a Work Psychologist as presented here: https://www.digitalassessment.de/

I can say that my feet got warmer but the real test of course – and perhaps then like a case study (N = 1) with more treatments like a stepstonian, a sipgatian, and quantopian fabric as well a control (no treatment, that is walking without socks! preparing for that right know!) – will be conducted in colder times which are coming soon to Germany. I will report on it! 😉 And perhaps you wanna join the experiment to lift the “N”, so results will be more valid?

Me testing the Trivago socks: And I am smiling realizing the double meaning of the words and symbols matching the two main areas of the company. Foto: Stefan Klemens

With this of course rather funny ending, I thank very much the organizers and speakers for this evening, and Trivago for hosting the meeting! Will we see us next time on a Düsseldorf Data Science Meetup (or another place if you like)?

Many greeting and all the best to you!

Stefan Klemens

PS: Want to exchange ideas on people analytics, digital assessment or artificial intelligence in HRM? Then network, write a message and/or make an appointment for an online meeting. Or the classic way: phone call.

And: You like my work and the content I regularly share? Then I’m happy about a Like or comment on LinkedIn. Thank you! 🙂 🙋‍♂️🌳


Generative AI Conference at WHU in Düsseldorf

Dear guest!

Exiting event: Generative AI Conference at WHU – Otto Beisheim School of Management, Campus Düsseldorf, on Friday, September 22, 2023, 09:30 – 19:00 CEST. Organized by the WHU Entrepreneurship Roundtable.

Speakers are among others: Edip Saliba (Microsoft), Hamidreza Hosseini (Ecodynamics), Frank Tepper-Sawicki, EMBA (Dentons), Dr. Christopher Smolka (Scale-up.NRW, WHU).

More information in the LinkedIn profile of the organizers. Program and tickets available via Eventbrite.

A got my ticket today. What about you?

Are you going to the event also? Then let´s meet and talk! And if you cannot join and are interested in HR Tech like AI, People Analytics, Digital Assessment and its application? Connect with me and have a talk too.

All the good and best wishes!

Stefan Klemens

PS: You want to exchange ideas on people analytics, digital assessment or artificial intelligence in HRM? Then network, write a message and/or make an appointment for an online meeting. Or the classic way: phone call.

And: You like my work and the content I regularly share? Then I’m happy about a Like or comment on LinkedIn. Thank you! 🙂 🙋‍♂️🌳

Technologie Veranstaltung

People Analytics: Düsseldorf Data Science Meetup at trivago

Great Talk: HR Data Science / People Analytics at trivago in Düsseldorf on August 29, 2023!

Dear guest!

I am looking forward to attend the next meeting of the Düsseldorf Data Science Meetup Group at trivago (LinkedIn Profil) on Tuesday, August 29, 2023, with its exiting title “Let’s learn about PyFixest and salary prediction models”.

Time and location: 6.30 pm to 9 pm at their outstanding headquarters (since 2018) in the Medienhafen at Kesselstraße 5-7, 40221 Düsseldorf.

The first speaker is going to be Alexander Fischer from trivago talking about PyFixest as well as ChatGPT and Copilot in helping code development.

After him Michael Matuschek and Tim Elfrink from StepStone Deutschland (LinkedIn Profil) hit the stage and take the mic. They will share their experiences with The World of Salary: Introducing Salary Prediction Models.

Since I was busy the last weeks and failed to mark my calendar, I almost missed a spot at this meetup (oh no!). But fortunately I could save a place at the end – together with 80 attendees (and 18 on the waitlist today). Thanks for a post by co-organizer Aida Orujova!

So I am exited learning about interesting case studies in HR Data Science / People Analytics from two international companies based in my birthplace.

For more information and further meetings check out this website: https://www.meetup.com/dusseldorf-data-science-meetup/events/295307371/

Are you also going to the event? Then let´s meet and talk! And if you cannot join and are interested in People Analytics and its application? Connect with me and have a talk also.

All the good and best wishes!

Stefan Klemens

PS 1: The last meeting of the Düsseldorf Data Science Meetup Group was hosted by SMS group (LinkedIn Profil) and held on June 12 this year. Read a review here.

PS 2: Want to exchange ideas on people analytics, digital assessment or artificial intelligence in HRM? Then network, write a message and/or make an appointment for an online meeting. Or the classic way: phone call.

And: You like my work and the content I regularly share? Then I’m happy about a Like or comment on LinkedIn. Thank you! 🙂 🙋‍♂️🌳

Veranstaltung Video

Videos: Generation AI – Data + AI Summit 2023 & Data + AI World Tour

Videotipp: Generation AI – Data + AI Summit 2023 aus San Francisco Ende Juni 2023. Und Eventtipp: Data + AI World Tour ab Herbst 2023 z.B. in München, Amsterdam und Zürich.

Lieber Gast!

Heute ein Videotipp: Spannende Berichte, Erfahrungen und Prognosen zu allen Themen rund um Daten und Künstliche Intelligenz vom “AI – Data + AI Summit 2023” von Databricks (u.a. Open Source Software Apache Sparks) Ende Juni 2023 live und online aus San Francisco, u.a. mit:

Ali Ghodsi (CEO und Mitgründer von Databricks), Satya Nadella (CEO von Microsoft), Larry Feinsmith (Managing Director, JP Morgan Chase), Marc Andreessen (Andreessen Horowitz, Entwickler des frühen Internetbrowsers Netscape Communicator, siehe auch meinen Newsblog-Artikel: Stiglitz vs. Andreessen) und als besonderer Gast Eric Schmidt (Ex-CEO Google) und (Ex-CEO Google) und Gründer des gemeinnützigen Unternehmens Schmidt Futures).

Interessant für mich als People Data Analyst und Psychologe: Marc Andreessen zitiert u.a. einige Forschungsergebnissen der Psychologie zur (menschlichen) Intelligenz und ist besonders aufgrund seines Kindes an diesem Thema und der Unterstützung der Entwicklung seiner Fähigkeiten durch die Künstliche Intelligenz interessiert.

Interessante Ausführungen gibt es auch von Eric Schmidt beispielsweise zur Zukunft von Generativer Künstlicher Intelligenz (Frontline, Special und Open Source Models). Schmidt Futures ist laut Wikipedia ein philanthropisches Unternehmen mit Fokus auf Technologie, Wissenschaft und “Talent Networking Programs”.

Hier die Links zu den Videos der Keynotes des Data + AI Summit 2023:

Keynotes von Mittwoch, 28.06.2023:

Keynotes von Donnerstag, 29.06.2023:


Veranstaltungstipp: Data + AI World Tour ab Herbst 2023

Noch einen Veranstaltungstipp im Zusammenhang mit Databricks: Im November 2023 gibt es ihre Data + AI World Tour live (“Registration is free”) in Deutschland und dem nahen Ausland wie München, Amsterdam, Zürich, Paris und London (sowie vielen anderen Städten weltweit ab Ende August 2023).

“Generative AI has changed how we build and use technology. More than ever, the lakehouse’s ability to connect data and AI can accelerate innovation for every organization. Join your peers at the Data + AI World Tour and be part of Generation AI”

Alle Informationen, Orte und Termine der Data + AI World Tour von Databricks unter https://www.databricks.com/dataaisummit/worldtour

Alles Gute und herzliche Grüße!

Stefan Klemens

Lust auf einen Austausch zu People Analytics, Digital Assessment oder Künstliche Intelligenz im HRM? Dann vernetzen, Nachricht schreiben und / oder Termin für ein Online-Meeting vereinbaren. Oder klassisch: Telefonieren.


Neu: Künstliche Intelligenz und Data Science in Theorie und Praxis

Lieber Gast,

Künstliche Intelligenz (KI) ist das Megathema seit Anfang 2023 – doch für Unternehmen ohne Data Science vielleicht ein Rechner ohne Strom! Gerne weise ich daher heute in meinem Newsblog auf ein neues Buch zu Data Science und KI hin, das am 20. Juni 2023 (bzw. einen Tag vorher als eBook) im Verlag Springer Spektrum erschien und folgenden Titel trägt:

»Künstliche Intelligenz und Data Science in Theorie und Praxis: Von Algorithmen und Methoden zur praktischen Umsetzung in Unternehmen«

Bevor ich jedoch auf das Buch der Herausgeber Andreas Gillhuber, Göran Kauermann und Wolfgang Hauner eingehe, möchte ich die Frage beantworten, was Data Science ist. Und am Ende, nach meinem Fazit zu dieser Buchvorstellung, finden sie noch mehr als 10 Quellen und weitere Literaturangaben zu Data Science, Künstlicher Intelligenz und Python als wichtige Programmiersprache.

Was ist Data Science?

Data Science, Datenwissenschaft, ist das anwendungsorientiert Fachgebiet, welches aus großen Datenmengen aus einem bestimmten Bereich Wissen filtern (Extraktion) und dieses Wissen in nützliche Handlungen für das Business zu transferiert.

Häufige Ziele sind, relevante Muster in Daten zu erkennen, die Schlussfolgerungen daraus zu ziehen und / oder Entwicklungen von Faktoren (Variablen) zu modellieren und zu prognostizieren. Dabei ist die Visualisierung von Daten und Ergebnissen sowohl für deren Verständnis als auch für die Kommunikation mit Dritten essentiell.

Eine Definition liefert Herter (2022, S. 26):

»Data Science ist ein interdisziplinäres Wissenschaftsfeld, das sich mit der exakten digitalen Erfassung, Analyse und Visualisierung vergangener, aktueller sowie zukünftiger Phänomene unserer realen Welt beschäftigt, um
datengetrieben den Prozess der Wissensgenerierung als bestmögliche Entscheidungsbasis für menschliches Handeln zu optimieren.«

Zur Datenanalyse nutzt das Gebiet vor allem multivariate Methoden der Statistik sowie des Maschinellen Lernens – wobei sich diese Bereiche nicht klar abgrenzen lassen, denn die Regressionsanalyse, die Faktorenanalyse oder die Clusteranalyse sind bekannte und klassische Verfahren der Statistik, während Algorithmen (mathematische Formeln, Rechenregeln), wie k-Nearest Neighbor, Support Vector Machines (SVM), Random Forest oder auch die künstlichen Neuronale Netze (KNN) ihren Ursprung eher in der Forschung zur Künstlichen Intelligenz haben.

Zudem gibt es mit den Verfahren der künstliche neuronalen Netze (KNN) bzw. des Deep Learning einen Bereich, der nicht zur klassischen Statistik gehört, sondern aus der KI-Forschung stammt – und dessen Ergebnisse wie Text- und Bildgeneratoren (z.B. ChatGPT und Midjourney) das Bild von Künstlicher Intelligenz in der Öffentlichkeit dominieren (sog. Generative Modelle).

An der Entwicklung von Künstlicher Intelligenz und von Deep Learnings waren neben Informatikern übrigens auch weitere Disziplinen wie Neuro- und Kognitive Psychologie oder Linguistik beteiligt und einige ihrer Pioniere wie Geoffrey Hinton haben Abschlüsse in mehreren Fächern.

Wer sich also mit Künstlicher Intelligenz tiefer beschäftigen und dessen Grundlagen verstehen möchte, sollte hierzu ein gutes Buch zu Data Science lesen. Und wer als (HR) Data Scientist bzw. People Analyst arbeitet, sollte stets auf dem aktuellen Stand sein und schauen, was es Neues in diesem Bereich gibt, wie andere an die Sache herangehen und was wir aus deren Erfahrungen und Praxisbeispielen lernen können.


Heute um 19 Uhr online: People Analytics Tech 2023: What You Need to Know

Lieber Gast!

Das Team um Stacia Sherman Garr von RedThread Research veranstaltet heute um 19 Uhr deutscher Zeit eine spannenden Online-Präsentation zu People Analytics. Genauer: Es geht um die Ergebnisse ihrer neusten Marktstudie zu People Analytics Technologien:

»Our latest study on the market shows that there are more vendors than ever, and the landscape is also changing rapidly. The market for vendors is very competitive, which means it is also confusing for buyers.«

People Analytics (ähnlich: HR Analytics, Workforce Analytics) nutzt Daten aus verschiedenen Bereichen und destilliert daraus Ergebnisse zur Lösungen aktueller Herausforderungen des Human Resource Management (z.B. Recruiting / Fachkräftemangel, Fluktuation, Talentmanagement, Weiterbildung).

Da dort meist multivariate Verfahren der Statistik u.a. zur Vorhersage eingesetzt werden (wie z.B. die Regressionsanalyse) spricht man auch von HR Data Science. Da multivariate Verfahren auch Teil des Maschinellen Lernens sind, geht es bei People Analytics auch um den Einsatz von Künstlicher Intelligenz im HR Management.

Wer sich für die 60-minütige Präsentation von Stacia und Priyanka Mehrotra zum People Analytics Markt interessiert (mit “market trends, a vendor landscape framework, vendor areas of focus, and customer feedback”), der kann sich heute noch anmelden unter:


Ich bin dabei! Du/Sie auch?

Herzliche Grüße, Stefan Klemens

PS: Lust auf einen Austausch zu People Analytics, KI im HRM oder Digital Assessment? Dann vernetzen und / oder auf einen Kaffee bei einem Videogespräch verabreden und treffen.