cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Soma
by Valued Contributor
  • 4780 Views
  • 3 replies
  • 5 kudos

Resolved! Enable custom Ipython Extension

How to enable custom Ipython Extension on Databricks Notebook Start

  • 4780 Views
  • 3 replies
  • 5 kudos
Latest Reply
Soma
Valued Contributor
  • 5 kudos

I want to load custom extensions which I create like custom call back events on cell runhttps://ipython.readthedocs.io/en/stable/config/callbacks.html

  • 5 kudos
2 More Replies
emanuele_maffeo
by New Contributor III
  • 5884 Views
  • 5 replies
  • 8 kudos

Resolved! Trigger.AvailableNow on scala - compile issue

Hi everybody,Trigger.AvailableNow is released within the databricks 10.1 runtime and we would like to use this new feature with autoloader.We write all our data pipeline in scala and our projects import spark as a provided dependency. If we try to sw...

  • 5884 Views
  • 5 replies
  • 8 kudos
Latest Reply
Anonymous
Not applicable
  • 8 kudos

You can switch to python. Depending on what you're doing and if you're using UDFs, there shouldn't be any difference at all in terms of performance.

  • 8 kudos
4 More Replies
alonisser
by Contributor II
  • 3832 Views
  • 3 replies
  • 4 kudos

Resolved! How to migrate an existing workspace for an external metastore

Currently we're on an azure databricks workspace, we've setup during the POC, a long time ago. In the meanwhile we have built quite a production workload above databricks.Now we want to split workspaces - one for analysts and one for data engineeri...

  • 3832 Views
  • 3 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

From databricks notebook just run mysqldump. Server address and details you can take from logs or configuration.I am including also link to example notebook https://docs.microsoft.com/en-us/azure/databricks/kb/_static/notebooks/2016-election-tweets.h...

  • 4 kudos
2 More Replies
USHAK
by New Contributor II
  • 1493 Views
  • 1 replies
  • 0 kudos

Hi , I am trying to schedule - Exam: Databricks Certified Associate Developer for Apache Spark 3.0 - Python.In the cart --> I couldn't proceed ...

Hi , I am trying to schedule - Exam: Databricks Certified Associate Developer for Apache Spark 3.0 - Python.In the cart --> I couldn't proceed without entering voucher. I do not have voucher.Please help

  • 1493 Views
  • 1 replies
  • 0 kudos
Latest Reply
USHAK
New Contributor II
  • 0 kudos

Can someone Please respond to my above question ? Can i write certification test without Voucher ?

  • 0 kudos
Jeff1
by Contributor II
  • 15565 Views
  • 3 replies
  • 4 kudos

Resolved! How to convert lat/long to geohash in databricks using geohashTools R library

I continues to receive a parsing error when attempting to convert lat/long data to a geohash in data bricks . I've tried two coding methods in R and get the same error.library(geohashTools)Method #1my_tbl$geo_hash <- gh_encode(my_tbl$Latitude, my_tbl...

  • 15565 Views
  • 3 replies
  • 4 kudos
Latest Reply
Jeff1
Contributor II
  • 4 kudos

The problem was I was trying to run the gh_encode function on a Spark dataframe. I needed to collect the date into a R dataframe then run the function.

  • 4 kudos
2 More Replies
manasa
by Contributor
  • 20180 Views
  • 3 replies
  • 7 kudos

Resolved! How to set retention period for a delta table lower than the default period? Is it even possible?

I am trying to set retention period for a delta by using following commands.deltaTable = DeltaTable.forPath(spark,delta_path)spark.conf.set("spark.databricks.delta.retentionDurationCheck.enabled", "false")deltaTable.logRetentionDuration = "interval 1...

  • 20180 Views
  • 3 replies
  • 7 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 7 kudos

There are two ways:1) Please set in cluster (Clusters -> edit -> Spark -> Spark config):spark.databricks.delta.retentionDurationCheck.enabled false 2) or just before DeltaTable.forPath set (I think you need to change order in your code):spark.conf.se...

  • 7 kudos
2 More Replies
AmanSehgal
by Honored Contributor III
  • 6366 Views
  • 5 replies
  • 12 kudos

Resolved! Query delta tables using databricks cluster in near real time.

I'm trying to query delta tables using JDBC connector in a Ruby app. I've noticed that it takes around 8 seconds just to connect with databricks cluster and then additional time to run the query.The app is connected to a web portal where users genera...

  • 6366 Views
  • 5 replies
  • 12 kudos
Latest Reply
User16763506477
Databricks Employee
  • 12 kudos

Hi @Aman Sehgal​ Could you please check SQL endpoints? SQL endpoint uses a photon engine. It can reduce the query processing time. And Serverless SQL endpoint can accelerate the launch timemore info: https://docs.databricks.com/sql/admin/sql-endpoin...

  • 12 kudos
4 More Replies
zayeem
by New Contributor
  • 3770 Views
  • 1 replies
  • 3 kudos

Resolved! Databricks - Jobs Last run date

Is there a way to get the last run date of job(s) ? I am trying to compile a report and trying to see if this output exists either in databricks jobs cli output or via api?

  • 3770 Views
  • 1 replies
  • 3 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 3 kudos

Sure. Using Databricks jobs API you can get this information.Use the following API endpoint to get list of all the jobs and their executions till date in descending order.You can pass job_id as parameter to get runs of a specific job.https://<databri...

  • 3 kudos
Anonymous
by Not applicable
  • 1284 Views
  • 0 replies
  • 3 kudos

March Madness + Data  Here at Databricks we like to use (you guessed it) data in our daily lives. Today kicks off a series called Databrags &#xd83c;&#xdf89; ...

March Madness + Data Here at Databricks we like to use (you guessed it) data in our daily lives. Today kicks off a series called Databrags Databrags are glimpses into how Bricksters and community folks like you use data to solve everyday problems, e...

  • 1284 Views
  • 0 replies
  • 3 kudos
Abel_Martinez
by Contributor
  • 2803 Views
  • 1 replies
  • 1 kudos

Resolved! Create data bricks service account

Hi all, I need to create service account users who can only query some delta tables. I guess I do that by creating the user and granting select right to the desired tables. But Data bricks requests a mail account for these users. Is there a way to cr...

  • 2803 Views
  • 1 replies
  • 1 kudos
Latest Reply
Abel_Martinez
Contributor
  • 1 kudos

HI @Kaniz Fatma​ , I've checked the link but the standard method requires a mailbox and the user creation using SCIM API looks too complicated. I solved the issue, I created a mailbox for the service account and I created the user using that mailbox....

  • 1 kudos
lecardozo
by New Contributor II
  • 7134 Views
  • 5 replies
  • 1 kudos

Resolved! Problems with HiveMetastoreClient and internal Databricks Metastore.

I've been trying to use ​the HiveMetastoreClient class in Scala to extract some metadata from Databricks internal Metastore, without success. I'm currently using the 7.3 LTS runtime.​The error seems to be related to some kind of inconsistency between...

  • 7134 Views
  • 5 replies
  • 1 kudos
Latest Reply
lecardozo
New Contributor II
  • 1 kudos

Thanks for the reference, @Atanu Sarkar​ .​Seems a little odd to me that I'd need to change the internal Databricks Metastore table to add a column expected by the client default Scala client. I'm afraid this could cause issues with other users/jobs ...

  • 1 kudos
4 More Replies
irfanaziz
by Contributor II
  • 9467 Views
  • 4 replies
  • 0 kudos

Resolved! If two Data Factory pipelines are run at the same time or share a window of execution do they share the Databricks spark cluster(if both have the same linked service)? ( job clusters are those that are create on the go, defined in the linked service).

Continuing the above case, does that mean if i have several like 5 ADF pipelines scheduled regularly at the same time, its better to use an existing cluster as all of the ADF pipelines would share the same cluster and hence the cost will be lower?

  • 9467 Views
  • 4 replies
  • 0 kudos
Latest Reply
Atanu
Databricks Employee
  • 0 kudos

for adf or job run we always prefer job cluster. but for streaming, you may consider using interactive cluster . but anyway you need to monitor the cluster load, if loads are high there will be chance to job slowness as well as failure. also data siz...

  • 0 kudos
3 More Replies
gibbona1
by New Contributor II
  • 5903 Views
  • 2 replies
  • 1 kudos

Resolved! Correct setup and format for calling REST API for image classification

I trained a basic image classification model on MNIST using Tensorflow, logging the experiment run with MLflow.Model: "my_sequential" _________________________________________________________________ Layer (type) Output Shape ...

mnist_model_error
  • 5903 Views
  • 2 replies
  • 1 kudos
Latest Reply
Atanu
Databricks Employee
  • 1 kudos

@Anthony Gibbons​  may be this git should work with your use case - https://github.com/mlflow/mlflow/issues/1661

  • 1 kudos
1 More Replies
matt_t
by New Contributor
  • 4999 Views
  • 2 replies
  • 1 kudos

Resolved! S3 sync from bucket to a mounted bucket causing a "[Errno 95] Operation not supported" error for some but not all files

Trying to sync one folder from an external s3 bucket to a folder on a mounted S3 bucket and running some simple code on databricks to accomplish this. Data is a bunch of CSVs and PSVs.The only problem is some of the files are giving this error that t...

  • 4999 Views
  • 2 replies
  • 1 kudos
Latest Reply
Atanu
Databricks Employee
  • 1 kudos

@Matthew Tribby​  does above suggestion work. Please let us know if you need further help on this. Thanks.

  • 1 kudos
1 More Replies
bonjih
by New Contributor
  • 9402 Views
  • 3 replies
  • 3 kudos

Resolved! AttributeError: module 'dbutils' has no attribute 'fs'

Hi,Using db in SageMaker to connect EC2 to S3. Following other examples I get 'AttributeError: module 'dbutils' has no attribute 'fs'....I guess Im missing an import?

  • 9402 Views
  • 3 replies
  • 3 kudos
Latest Reply
Atanu
Databricks Employee
  • 3 kudos

agree with @Werner Stinckens​  . also may try importing dbutils - @ben Hamilton​ 

  • 3 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels