cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

shreyag
by New Contributor II
  • 2000 Views
  • 4 replies
  • 2 kudos

Resolved! scheduling tasks through CLI

Is there a way to schedule tasks or jobs through the Databricks CLI instead of the GUI? I want to be able to create a job flow with different notebook through the CLI.

  • 2000 Views
  • 4 replies
  • 2 kudos
Latest Reply
Atanu
Esteemed Contributor
  • 2 kudos

I agreed with @Kaniz Fatma​  https://docs.databricks.com/dev-tools/cli/jobs-cli.html?_ga=2.101966982.684786035.1646666830-480220406.1638459894 this is the job CLI we currently support @Shreya Gupta​ 

  • 2 kudos
3 More Replies
Alx
by New Contributor
  • 2474 Views
  • 3 replies
  • 1 kudos

Resolved! Problem with network security group (NSG) rules in case of VNet injection

Hi everyone,Our internal company security policy for the Cloud infrastructure requires to have custom outbound NSG rule that denies all traffic. The rules attributes should be as follows: Priority: 4096Port: AnyProtocol: AnySource: AnyDestination: An...

  • 2474 Views
  • 3 replies
  • 1 kudos
Latest Reply
Atanu
Esteemed Contributor
  • 1 kudos

HELLO @Alexey Tyulyaev​  please check https://docs.microsoft.com/en-us/azure/virtual-network/manage-network-security-group

  • 1 kudos
2 More Replies
alejandrofm
by Valued Contributor
  • 3980 Views
  • 3 replies
  • 3 kudos

Resolved! Delta, the specified key does not exist error

Hi, I'm having this error too frequently on a few tables, I check on S3 and the partition exists and the file is there on the partition.error: Spectrum Scan Error: DeltaManifestcode: 15005context: Error fetching Delta Lake manifest delta/product/sub_...

  • 3980 Views
  • 3 replies
  • 3 kudos
Latest Reply
alejandrofm
Valued Contributor
  • 3 kudos

@Hubert Dudek​ , I'll add that sometimes, just running:GENERATE symlink_format_manifest FOR TABLE schema.tablesolves it, but, how can the symlink get broken?Thanks!

  • 3 kudos
2 More Replies
shrewdTurtle
by New Contributor II
  • 2803 Views
  • 5 replies
  • 5 kudos

Resolved! Cannot open Jobs tab in Databricks Community edition.

Hi,I get the following exception when I try to open jobs tab.Uncaught TypeError: Cannot read properties of undefined (reading 'apply')   Reload the page and try again. If the error persists, contact support. Reference error code: fd9ae37c18c1400cb15...

  • 2803 Views
  • 5 replies
  • 5 kudos
Latest Reply
shrewdTurtle
New Contributor II
  • 5 kudos

@Kaniz Fatma​ , @Werner Stinckens​ thanks for the clarification. I agree with @Werner Stinckens​ , Error message should be more useful.

  • 5 kudos
4 More Replies
NAS
by New Contributor III
  • 1589 Views
  • 1 replies
  • 1 kudos

Resolved! "import pandas as pd" => [Errno 5]

When I type import pandas as pdfrom a Notebook in a Repo I get:--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /usr/lib/python3.8/importlib/_boots...

  • 1589 Views
  • 1 replies
  • 1 kudos
Latest Reply
NAS
New Contributor III
  • 1 kudos

Thanks to Elliott Hertz, I found out that the ML Experiments cannot be stored in the repo. After I moved them to my Workspace everything seems to work.

  • 1 kudos
RohanB
by New Contributor III
  • 4202 Views
  • 8 replies
  • 3 kudos

Resolved! Spark Streaming - Checkpoint State EOF Exception

I have a Spark Structured Streaming job which reads from 2 Delta tables in streams , processes the data and then writes to a 3rd Delta table. The job is being run with the Databricks service on GCP.Sometimes the job fails with the following exception...

  • 4202 Views
  • 8 replies
  • 3 kudos
Latest Reply
RohanB
New Contributor III
  • 3 kudos

Hi @Jose Gonzalez​ ,Do you require any more information regarding the code? Any idea what could be cause for the issue?Thanks and Regards,Rohan

  • 3 kudos
7 More Replies
Jana
by New Contributor III
  • 6974 Views
  • 8 replies
  • 2 kudos

Resolved! Parsing 5 GB json file is running long on cluster

I was creating delta table from ADLS json input file. but the job was running long while creating delta table from json. Below is my cluster configuration. Is the issue related to cluster config ? Do I need to upgrade the cluster config ?The cluster ...

  • 6974 Views
  • 8 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

with multiline = true, the json is read as a whole and processed as such.I'd try with a beefier cluster.

  • 2 kudos
7 More Replies
SCOR
by New Contributor II
  • 1928 Views
  • 3 replies
  • 4 kudos

SparkJDBC42.jar Issue ?

Hi there!I am using the SparkJDBC42.jar in my Java application to use my delta lake tables , The connection is made through databricks sql endpoint in where I created a database and store in it my delta tables. I have a simple code to open connection...

  • 1928 Views
  • 3 replies
  • 4 kudos
Latest Reply
jose_gonzalez
Moderator
  • 4 kudos

Hi @Seifeddine SNOUSSI​ ,Are you still having issue or you were able to resolve this issue? please let us know

  • 4 kudos
2 More Replies
Kody_Devl
by New Contributor II
  • 2235 Views
  • 2 replies
  • 1 kudos

Resolved! HTML Backup Import Into my Account

Hi AllI would like to Import my HTML notebook backup into my databricks account and use it as if it was my master (I am a developer and have many exported HTML backups that I may want to reuse. When you open an .HTML from backup, databricks has, ...

  • 2235 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

Hi @Ross Crill​ ,Did @Kaniz Fatma​ reply helped you to resolve your question? please let us know

  • 1 kudos
1 More Replies
Dunken
by New Contributor III
  • 3577 Views
  • 7 replies
  • 3 kudos

Resolved! Databricks and CD4ML

I would like to use Databricks in a CD4ML way (see also https://martinfowler.com/articles/cd4ml.html). Is this possible? I would like to develop and train models in one environment once qualified, I would like to deploy the model with the application...

  • 3577 Views
  • 7 replies
  • 3 kudos
Latest Reply
Atanu
Esteemed Contributor
  • 3 kudos

something below you are looking for @Armin Galliker​ ?

  • 3 kudos
6 More Replies
brickster_2018
by Esteemed Contributor
  • 3780 Views
  • 2 replies
  • 2 kudos
  • 3780 Views
  • 2 replies
  • 2 kudos
Latest Reply
MoJaMa
Valued Contributor II
  • 2 kudos

Since Workflows (Multi-Task jobs) is now GA, one way to work around the 1000 concurrent jobs limit is to use tasks within a job. Each job can have 100 tasks, and these tasks do not count toward the concurrent job limit.

  • 2 kudos
1 More Replies
alejandrofm
by Valued Contributor
  • 2413 Views
  • 4 replies
  • 5 kudos

Resolved! Show Vacuum operation result (files deleted) without DRY RUN

Hi, I'm runing some scheduled vacuum jobs and would like to know how many files were deleted without making all the computation twice, with and without DRY RUN, is there a way to accomplish this?Thanks!

  • 2413 Views
  • 4 replies
  • 5 kudos
Latest Reply
RKNutalapati
Valued Contributor
  • 5 kudos

We have to enable logging to capture the logs for vacuum.spark.conf.set("spark.databricks.delta.vacuum.logging.enabled","true")

  • 5 kudos
3 More Replies
Suman
by New Contributor III
  • 2709 Views
  • 5 replies
  • 3 kudos

Resolved! Change Data Feed functionality from SQL Endpoint

I am trying to run command to retrieve change data from sql endpoint. It is throwing below error."The input query contains unsupported data source(s).Only csv, json, avro, delta, parquet, orc, text data sources are supported on Databricks SQL."But th...

  • 2709 Views
  • 5 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @Suman Chowdhury​ , Change Data Feed is only available in Databricks Runtime 8.4 and above.

  • 3 kudos
4 More Replies
Oliver_Floyd
by Contributor
  • 1916 Views
  • 3 replies
  • 4 kudos

Resolved! How to update external metastore cluster configuration on the fly ?

Hello,In my use case, my data is pushed to an adls gen2 container called ingestAfter some data processing on a databricks cluster of the ingest workspace, I declare the associated table in an external metastore for this workspaceAt the end of this pr...

  • 1916 Views
  • 3 replies
  • 4 kudos
Latest Reply
Oliver_Floyd
Contributor
  • 4 kudos

Hello @Atanu Sarkar​ ,Thank you for your answer. I have created a feature request. I hope, it will be soon accepted ^^

  • 4 kudos
2 More Replies
Mradula
by New Contributor
  • 734 Views
  • 0 replies
  • 0 kudos

Displaying the queried data from mounted data from Azure Blob storage to databricks is slow

I have mounted by Azure blob storage json file to databricks which has around 18GB and trying to perform a simple count operation on it and I am noticing that it takes 14 mins for the same in the Community edition . seeking answers on whether this is...

14 min count
  • 734 Views
  • 0 replies
  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels