Siftee.io – Data Engineer Intern (Api and Data Acquisition Focus)

Company
Siftee.io
belmondgroup.co
Designation
Data Engineer Intern (Api and Data Acquisition Focus)
Date Listed
25 Aug 2023
Job Type
Entry Level / Junior Executive
Free/ProjPart/TempIntern/TS
Job Period
Immediate Start - Flexible End
Profession
IT / Information Technology
Industry
Computer and IT
Location Name
Prinsep Street, Singapore
Address
Prinsep St, Singapore
Map
Allowance / Remuneration
$1,000 - 1,200 monthly
Company Profile

Siftee.io (VC-backed data discovery start-up)

Siftee.io is a VC-backed company founded by ex-McKinsey and Salesforce leaders. Our vision is to be the first and only place analysts will come to find external data.

We help users search and download data (public data and premium), using our smart filters to match search results to their intended analysis, and we help premium data providers with earlier discovery in the search process, with a data quality score which distinguishes them vs. competition

We want to be the #1 data search platform within the next 5 years

Job Description

We have a vision to allow our users to search at least 50,000 data sources on Siftee in the next 6 months, so our priority is improve and scale the data acquisition process. The chosen candidate will be working alongside the Chief Product Officer and Full Stack Developer to achieve this.

What you will do:

  • Dive deep into the challenges our customers at Siftee face, identifying the best sources of data to answer their questions and serve their needs.
  • Craft and refine our data acquisition tools, from building new web scrapers to ensuring the reliability and efficiency of our existing data pipelines.
  • Develop and deploy scalable solutions to web and data challenges, utilizing both statistical methods and machine learning techniques.

The ideal profile:

  • You're passionate about the entire data lifecycle, from discovery and scraping to cleaning and ingestion.
  • Hands-on experience with web scraping pipelines, including crafting spiders, bypassing bot prevention strategies, and ensuring data integrity.
  • Proficient with popular scraping tools and libraries like BeautifulSoup, Xpaths, Selenium, Puppeteer, and Splash.
  • Adept at extracting data from a variety of formats including HTML, XML, REST, GraphQL, PDFs, and spreadsheets.
  • Skilled in fortifying web scrapers against common obstacles like bot detection, site bans, CAPTCHA challenges, and proxy issues.
  • Solid grounding in Object-Oriented Programming, SQL, and Django ORM basics.
This position is already closed and no longer available.  You may like to view the other latest internships here.

Discuss this Job:

You can discuss this job on Clublance.com #career-jobs channel, or chat with other community members for free:
Share This Page