Conversations with Young Lives' Data Managers: Part Three

Ahead of the publication of our new report on data management, we sat down to speak with some of Young Lives’ Data Managers, past and present, about their experiences of the role.

This is the third and final conversation in the series, with Anne Yates Solon  (International Data Manager 2007-2018),  Monica Lizama  (Peru’s Data Manager), Tien Nguyen  (Vietnam’s Data Manager), Shyam Sunder   (India’s Data Manager) and Hazel Ashurst  (Data Coordinator, Oxford 2011-2013).

Catch up here with the first and second parts in this series.


Before the severity of the COVID19 pandemic hit, Young Lives was setting up for our first round of ‘Young Lives at Work’ following our young people as they enter their early to mid-twenties. Round 6 and 7 have since been preceded by our phone survey studying the challenges the young people face due to the coronavirus and the ensuing lockdown.

These interviews took place before we had arranged our phone survey.


Looking forward to Young Lives Round 6 and Round 7 surveys what do you see as the main challenges? What are the opportunities?

Anne Yates Solon: I think one of the main considerations will be attrition. We haven’t been able to undertake a typical tracking round because of how the funding and timing worked out. I’d be slightly concerned that our attrition figures will be higher, especially as our cohorts age and are leaving their household settings. That’s my main concern.

Tien Nguyen: Yes, it will be challenging to find the children. Now they’re older and move to find work. They go to different places and don’t stay in their home village. We have to spend a lot of time to try and find them.

Shyam Sunder: The main challenge for Round 6 is that our sample respondents have become adults and getting information from them and finding time to meet with them will be a challenging task at the field level. A lot of travel will also have to be made to contact respondents.

Monica Lizama: I think a challenge will be that many young people have completed their studies and then they move. They’re looking for new jobs or starting their own families. Their life is changing as they grow up and in general, they are going new places. Another challenge is that many of them don’t have much time. Now they’re people with jobs and families, often in urban areas; very few of them stay in the rural areas. They migrate to cities, or the capital, and in cities the jobs are sometimes from 8am to 9pm. So, they get home and they’re very tired, and the enumerator needs to work with their schedule by interviewing very early in the morning, very late at night or on the weekend. Sometimes it takes a lot of effort to convince participants to continue with Young Lives under these circumstances. In our tracking phone calls, some young people say “oh, another round? I have to answer the survey again?”, but most are actually content to continue with the study.

Hazel Ashurst: In terms of data management, we won’t need to change much for the next survey. I think the tablets and SurveyBe[1]  were successful. It is labour intensive, processing the data, but that’s unavoidable.


What was your experience with field management and piloting?

Anne: We piloted in the field. Early on I used to go to all the countries for the pilot process. There were two types of piloting that happened after the implementation of Computer Assisted Personal Interviewing (CAPI) firstly, for the content of the questionnaires to make sure we were asking the correct and appropriate questions, and secondly there was piloting of the CAPI program to check skips and translations etc.

Then there were debriefs, and we would go over the changes and approve the addition or dropping of questions, and then adjust the CAPI program accordingly. Also, before piloting, we would meet with the fieldworkers to go over the questionnaire section by section and the fieldworkers would get adamant about questions, in terms of what we could or couldn’t ask.

It was important to take their opinions on board, since those were the people in the field chatting to the participants. It always gave me faith in our field teams, because they were so good at feeding back context specific information.

Tien: I went to the field and sometimes attended the interviews to make sure the fieldworkers followed the instructions correctly.


What was the process of archiving like? Did you ever have to re-archive data?

Anne: We were obligated to publicly archive data. You can clean data forever. Data will never be 100% clean. We had to determine a cut-off point for when it was clean enough. Once we reached that point, all data was then anonymized and all of the documentation around samples, questionnaire designs, variables, any reference documents related to the data, was then archived with the UK data archive. They would review the data, highlighting any queries, and we’d then figure out if we wanted to drop variables etc. Then it would go live.

We absolutely had to re-archive. It goes back to the mistake I mentioned earlier, realising, for example, when a person wasn’t deceased. I would keep track of what data was submitted when, and I would adjust the data for that round. But I would always hold off… for example, when I was ready to archive Round 3, I would re-archive any data from Round 1 and 2 that I needed to archive... I made sure I did it alongside the main archiving.

Hazel: Archiving was done mainly by Anne. She went to a local archive in Essex and would send anonymized data. She would have to sign off the data set to ensure it was closed, clean, and ready for archive. She would then include a data dictionary.

Anne also worked hard on the panel data set, which was a subset.


Can you explain the concept of data linking? E.g. Instruments and constructs that are linked across the rounds—different questions at different ages.

Anne: I remember with the implementation of the school survey, they were going to Young Lives' schools. Historically we would ask the name of the school in the household survey, but that would have been a string variable [2] that I would have removed. But when we received the school survey data, we coded the schools and those then needed to be linked up to the Young Lives’ kids that went to those schools. So, we had to go back in and code all the school names in the data! Looking back, I would say for a longitudinal process you should consider coding multiple aspects because you don’t know what’s going to be asked in the future.

That’s been another good challenge for Young Lives. Things changed as we went on, such as adding a school survey we hadn’t planned for. If it had been planned for, you might have set things up differently at the beginning.

Over a project of x amount of years, things are going to change that are going to make you have to retrofit previous work. I had to go back and get all the names of all the schools, and all the names of all the kids. Then I had to go back and try to figure it all out…people would enter the data differently. They wouldn’t all spell the school names correctly. So, I had to build a penultimate list of schools. Then I had to go to the country manager, and they had to check to make sure “St. Teresa’s” wasn’t the same as “St. Mary Teresa’s.” Then we would code those; we had to write a code that would find those words in the dataset and code them, but then those words weren’t always spelled correctly.


What have you learned /what has the organisation learned over the years?

Anne: I’ve learned a lot, especially when it comes to a longitudinal process. There are certain things you can do when you’re setting up a longitudinal research project in Round 1 that we know to do now, because we’ve had to go back and correct it.  Things become apparent across time. You can be sitting in Round 3 thinking “why didn’t we do that in Round 1? It seems so obvious!” For instance, if we asked the same question across several rounds, we didn’t have a system for the variable name staying the same across six rounds, with a round identifier. So, I think from Round 3 onwards we have a round identifier such as “R3, R4, R5” built into the variable names, which should have been done from Round 1.

Monica: There are few studies in Peru like Niños del Milenio (Young Lives' Peru) that are longitudinal. We have all learned lessons along the way. I know a lot more about the field now than when I started. Each village or family may be distinct, but we want the survey to be distributed in the same way, so that we can get the most accurate data possible. We train our enumerators before they go into the field. However, when people go to the field they will encounter many different situations. I get many calls from the field with interviewers asking me how to respond to various situations. I try to make sure my responses are uniform, so that everyone has the same information and the survey is conducted uniformly. I am not the field coordinator, my colleague Sofia is, but I am very involved in this process. I know the enumerators and I participate in the trainings when we begin rounds of data collection.  In addition, training in CAPI is the most important thing.


In your experience, what are the factors that are important to successful long-term international research collaborations?

Tien: I think for me, the people involved in the project are important. We must be able to work together. It’s better to work together when we know each other very well. If we change the staff, we have to take time to get to know them, and how to work with them, and they have to take to time to learn about the project.

Anne: The countries where we were able to keep the data manager consistent and onboard were the easiest to work with. The data was the cleanest. It’s important, if you can, to maintain consistent staff and invest in their capacity. We kept consistent staff, and made sure they were happy and trained, and were supported with everything they needed to get their job done well.

Monica: Communication is key. We are always in communication with each other and know each other well after years of working together. For me what was very important was in 2012, after Round 3, the data managers in all the study countries went to Oxford for training about CAPI and it helped us get to know each other personally. We always stayed in touch after that. We helped each other with any challenges that came up, especially in terms of programming CAPI. We relied on each other. Of course, some people left, but many others stayed, and we have developed successful working relationships based on personal connections and communicating a lot.

Hazel: We had excellent relationships with the data managers in all four countries, which was really helpful. Before, during and after data collection, we were in touch with them. I also met data managers in person when they came to Oxford for training sessions in CAPI. Each data manager had different skills and personalities, but they were all great to work with.

The Research Assistants (RAs) were also all really good and hardworking, and they drove the process! Close collaboration between RAs and data management processes was really valuable. We just felt that we were all working together on a valuable project. We were all on the same team. Not only did we work across countries, but we also worked across an interesting point in time in terms of technology. I remember my time at Young Lives fondly, as everyone was so nice, and we had such good relationships with everyone.



[1] Surveybe is a data collection and management software that utilizes computer-assisted personal interviewing (CAPI).

[2] A string variable is a variable that contains not just numbers, but other characters such as letters and punctuation.