cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

missyT
by New Contributor III
  • 1414 Views
  • 3 replies
  • 4 kudos

Resolved! AI assistant and machine Learning

I am looking to create a basic virtual assistant (AI) that implements machine learning mechanisms.I have some basic knowledge of Python and I have seen some courses on the internet (youtube in particular) that look very interesting.But for the moment...

  • 1414 Views
  • 3 replies
  • 4 kudos
Latest Reply
valeryuaba
New Contributor III
  • 4 kudos

Hey everyone!I'm clearly excited about this topic since I'm a huge fan of AI assistants and machine learning. MissyT, creating a basic virtual assistant with machine learning capabilities is an excellent idea! With your simple knowledge of Python and...

  • 4 kudos
2 More Replies
Data4
by New Contributor II
  • 2309 Views
  • 1 replies
  • 5 kudos

Resolved! Load multiple delta tables at once from Sql server

What’s the best way to efficiently move multiple sql tables in parallel into delta tables

  • 2309 Views
  • 1 replies
  • 5 kudos
Latest Reply
Tharun-Kumar
Honored Contributor II
  • 5 kudos

@Data4   To enable parallel read and write operations, the ThreadPool functionality can be leveraged. This process involves specifying a list of tables that need to be read, creating a method for reading these tables from the JDBC source and saving t...

  • 5 kudos
erigaud
by Honored Contributor
  • 4386 Views
  • 5 replies
  • 6 kudos

Resolved! SFTP Autoloader

Hello, Don't know if it is possible, but I am wondering if it is possible to ingest files from a SFTP server using autoloader ? Or do I have to first copy the files to my dbfs and then use autoloader on that location ? Thank you !

  • 4386 Views
  • 5 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

Hi @erigaud  We haven't heard from you since the last response from​, @BriceBuso  and I was checking back to see if her suggestions helped you. Or else, If you have any solution, please share it with the community, as it can be helpful to others.  Al...

  • 6 kudos
4 More Replies
BriceBuso
by Contributor II
  • 3790 Views
  • 3 replies
  • 3 kudos

Run a multiple %command in the same cell

Hello, is there a way to run multiple %command in a same cell ? I heard that's not possible but would like a confirmation and maybe if it could be an idea for future updates.Moreover, is there a way to mask the output of cells (especially markdown) w...

  • 3790 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @BriceBuso  Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.  We'd love to hear from you. Thanks...

  • 3 kudos
2 More Replies
bchaubey
by Contributor II
  • 2338 Views
  • 4 replies
  • 3 kudos

Data Pull from S3

I have some file in S3, I want to process through Databricks, How it possible? Could you please help me regarding the same.

  • 2338 Views
  • 4 replies
  • 3 kudos
Latest Reply
dream
Contributor
  • 3 kudos

access_key = dbutils.secrets.get(scope = "aws", key = "aws-access-key") secret_key = dbutils.secrets.get(scope = "aws", key = "aws-secret-key") encoded_secret_key = secret_key.replace("/", "%2F") aws_bucket_name = "<aws-bucket-name>" mount_name = "<m...

  • 3 kudos
3 More Replies
baatchus
by New Contributor III
  • 4318 Views
  • 3 replies
  • 1 kudos

Deduplication, Bronze (raw) or Silver (enriched)

Need some help in choosing between where to do deduplication of data. So I have sensor data in blob storage that I'm picking up with Databricks Autoloader. The data and files can have duplicates in them.Which of the 2 options do I choose?Option 1:Cre...

  • 4318 Views
  • 3 replies
  • 1 kudos
Latest Reply
Tharun-Kumar
Honored Contributor II
  • 1 kudos

@peter_mcnally You can use watermark to pick the late records and send only the latest records to the bronze table. This will ensure that you always have the latest information in your bronze table.This feature is explained in detail here - https://w...

  • 1 kudos
2 More Replies
anastassia_kor1
by New Contributor
  • 4728 Views
  • 2 replies
  • 1 kudos

Error "Distributed package doesn't have nccl built in" with Transformers Library.

I am trying to run a simple training script using HF's transformers library and am running into the error `Distributed package doesn't have nccl built in` error.Runtime: DBR 13.0 ML - SPark 3.4.0 - Scala 2.12Driver: i3.xlarge - 4 coresNote: This is a...

  • 4728 Views
  • 2 replies
  • 1 kudos
Latest Reply
patputnam-db
New Contributor II
  • 1 kudos

Hi @anastassia_kor1,For CPU-only training, TrainingArguments has a no_cuda flag that should be set.For transformers==4.26.1 (MLR 13.0) and transformers==4.28.1 (MLR 13.1), there's an additional xpu_backend argument that needs to be set as well. Try u...

  • 1 kudos
1 More Replies
mshettar
by New Contributor II
  • 1491 Views
  • 2 replies
  • 0 kudos

Databricks CLI's workspace export_dir command adds unnecessary edits despite not making any change in the workspace

databricks workspace export_dir / export command with overwrite option enabled adds non-existent changes in the target directory. 1. It introduces new line deletion and 2. add/deletion of MAGIC comments despite not making any meaningful changes in th...

Screenshot 2023-06-06 at 2.44.48 PM
  • 1491 Views
  • 2 replies
  • 0 kudos
Latest Reply
RyanHager
Contributor
  • 0 kudos

I am encountering this issue as well and it did not happen previously.  Additionally, you see this pattern if you are using repos internally and make a change to a notebook in another section.

  • 0 kudos
1 More Replies
Chakra
by New Contributor II
  • 1471 Views
  • 1 replies
  • 1 kudos

Create job cluster with a docker image in azure data factory

Is there a way to create a job cluster in azure data factory with a docker image either through API or UI

  • 1471 Views
  • 1 replies
  • 1 kudos
Latest Reply
m_szklarczyk
New Contributor II
  • 1 kudos

Does anyone handle how to run from ADF custom image as jobs compute?

  • 1 kudos
Shay83
by New Contributor II
  • 1112 Views
  • 1 replies
  • 3 kudos

Resolved! stream from specific time

Hello,How should I start stream a delta table from specific point in time?

  • 1112 Views
  • 1 replies
  • 3 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 3 kudos

If you are streaming from a Delta table, you can specify the starting version or timestamp. You can refer the document for complete detail: https://docs.databricks.com/structured-streaming/delta-lake.html#specify-initial-position

  • 3 kudos
User16826992666
by Valued Contributor
  • 1676 Views
  • 5 replies
  • 1 kudos

When developing a Delta Live Table, can I test my code in the notebook?

Just wondering about the dev flow when building Delta Live Tables. I write my code in the notebook so it would be useful to be able to test it out from within that environment.

  • 1676 Views
  • 5 replies
  • 1 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 1 kudos

You need to create a DLT pipeline to test the code. 

  • 1 kudos
4 More Replies
TonyLe
by New Contributor
  • 1419 Views
  • 2 replies
  • 1 kudos

IllegalArgumentException: requirement failed: Invalid uri

Hi all, I'm trying to connect to MongoDB using the Databricks notebook. I keep getting the error that my MongoDB uri is invalid. The uri works when connecting from my local machine using the Rust driver. I pretty much followed the tutorial that was g...

  • 1419 Views
  • 2 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

Also take in mind firewall issues in case the mongodb is on-prem or on some other location.

  • 1 kudos
1 More Replies
tirato
by New Contributor II
  • 2075 Views
  • 3 replies
  • 2 kudos

Resolved! Cannot import-dir from AzureDevops, but works fine locally.

Hello,as i'm trying to create a CI/CD for the project, I'm finding myself stuck.Tried to upload the Notebooks from my Azure DevOps Release and I'm getting 403-forbidden access.I used 'cat ~/.databrickscfg file and matched with the local config that I...

  • 2075 Views
  • 3 replies
  • 2 kudos
Latest Reply
valeryuaba
New Contributor III
  • 2 kudos

Hey everyone! I can totally relate to the frustration of encountering authentication issues when setting up a CI/CD pipeline. It's great that you're able to import the notebooks locally, but facing difficulties on Azure DevOps can be quite puzzling.F...

  • 2 kudos
2 More Replies
brickster_2018
by Esteemed Contributor
  • 2701 Views
  • 2 replies
  • 1 kudos

Resolved! Cluster Health Dashboard

Is there a cluster health dashboard which has the details of the total number of running interactive cluster, the total number of job clusters? Also Flag clusters with issues. 

  • 2701 Views
  • 2 replies
  • 1 kudos
Latest Reply
valeryuaba
New Contributor III
  • 1 kudos

Thanks!

  • 1 kudos
1 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels