cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

GNarain
by New Contributor II
  • 6139 Views
  • 7 replies
  • 4 kudos

Resolved! Is there api call to set "Table access control" workspace config ?

Is there api call to set "Table access control" workspace config ?

  • 6139 Views
  • 7 replies
  • 4 kudos
Latest Reply
SvenPeeters
New Contributor III
  • 4 kudos

Faciing the same issue, tried to fetch the current value via /api/2.0/workspace-conf?keys=enableTableAccessControlUnfortunately this is returning a 400 {    "error_code": "BAD_REQUEST",    "message": "Invalid keys: [\"enableTableAccessControl\"]"}

  • 4 kudos
6 More Replies
Eldar_Dragomir
by New Contributor II
  • 1696 Views
  • 1 replies
  • 2 kudos

Resolved! Reprocessing the data with Auto Loader

Could you please provide me an idea how I can start reprocessing of my data? Imagine I have a folder in adls gen2 "/test" with binaryFiles. They already processed with current pipeline. I want to reprocess the data + continue receive new data. What t...

  • 1696 Views
  • 1 replies
  • 2 kudos
Latest Reply
Tharun-Kumar
Databricks Employee
  • 2 kudos

@Eldar_Dragomir In order to re-process the data, we have to change the checkpoint directory. This will start processing the files from the beginning. You can use cloudFiles.maxFilesPerTrigger, to limit the number of files getting processed per micro-...

  • 2 kudos
anarad429
by New Contributor
  • 1401 Views
  • 1 replies
  • 1 kudos

Resolved! Unity Catalog + Reading variable from external notebook

I am trying to run a notebook which reads some of its variables from and external notebook (I used %run command for that purpose), but it keeps giving me error that these variables are not defined. These sequences of notebooks run perfectly fine on a...

  • 1401 Views
  • 1 replies
  • 1 kudos
Latest Reply
Atanu
Databricks Employee
  • 1 kudos

I think the issue here is the variable is not created until a value is assigned to it. So, you may need to assign a value to get_sql_schema

  • 1 kudos
NathanLaw
by New Contributor III
  • 856 Views
  • 1 replies
  • 0 kudos

CPU and GPU Elapse Runtimes

I have to 2 questions about elapsed job runtimes. The same Scoring notebook is run 3 times as 3 Jobs.   The jobs are identical, same PetaStorm code, CPU cluster config ( not Spot cluster) and data but have varying elapsed runtimes.   Elapsed runtimes...

  • 856 Views
  • 1 replies
  • 0 kudos
Latest Reply
shyam_9
Databricks Employee
  • 0 kudos

Hi @NathanLaw, Could you please confirm, if you have set any parameters for the best model? Is this stop after running some epochs if there is no improvement in the model performance? 

  • 0 kudos
Sanjay_AMP
by New Contributor II
  • 938 Views
  • 1 replies
  • 1 kudos

Deployment-ready sample source-code for Delta Live Table & Autoloader

Hi all,We are planning to develop an Autoloader based DLT Pipeline that needs to beDeployable via a CI/CD PipelineObservableCan somebody please point me to source-code that we can start with a firm foundation instead of falling into a newbie-pattern ...

  • 938 Views
  • 1 replies
  • 1 kudos
Latest Reply
Priyanka_Biswas
Databricks Employee
  • 1 kudos

Hi @Sanjay_AMP Delta Live Tables and AutoLoader can be used together to incrementally ingest data from cloud object storage.• Python code example: - Define a table called "customers" that reads data from a CSV file in cloud object storage. - Define a...

  • 1 kudos
wojciech_jakubo
by New Contributor III
  • 9871 Views
  • 7 replies
  • 2 kudos

Question about monitoring driver memory utilization

Hi databricks/spark experts!I have a piece on pandas-based 3rd party code that I need to execute as a part of a bigger spark pipeline. By nature, pandas-based code is executed on driver node. I ran into out of memory problems and started exploring th...

Driver memory cycles_ Busy cluster
  • 9871 Views
  • 7 replies
  • 2 kudos
Latest Reply
Tharun-Kumar
Databricks Employee
  • 2 kudos

Hi @wojciech_jakubo 1. JVM memory will not be utilized for python related activities. 2. In the image we could only see the storage memory. We also have execution memory which would also be the same. Hence I came up with the executor memory to be of ...

  • 2 kudos
6 More Replies
kg6ka
by New Contributor
  • 2284 Views
  • 1 replies
  • 1 kudos

Is it possible to do without the github token and integration?

Hey, guys.I have a question, so, I have databricks jobs in workflow that are linked to my databricks repo, which contains the necessary scripts for one or another job. That is, the job is linked to the databricks repo.The main code is developed in gi...

  • 2284 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16752239289
Databricks Employee
  • 1 kudos

Does the user which the API token is generated from has the git credential configured for the git repo ?If not, you can follow the steps here : https://docs.databricks.com/en/repos/get-access-tokens-from-git-provider.html 

  • 1 kudos
Thor
by New Contributor III
  • 4899 Views
  • 1 replies
  • 2 kudos

Resolved! Dynamically change spark.task.cpus

Hello,I'm facing a problem with big tarballs to decompress and to fit in memory I had to limit Spark processing too many files at the same time so I changed the following property on my 8 cores VMs cluster:spark.task.cpus 4 This setting is the thresh...

  • 4899 Views
  • 1 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

Hi @Thor, Spark does not offer the capability to dynamically modify configuration settings, such as spark.task.cpus, for individual stages or transformations while the application is running. Once a configuration property is set for a Spark applicati...

  • 2 kudos
bharanireddy
by New Contributor
  • 1702 Views
  • 1 replies
  • 0 kudos

Resolved! Unable to access Data Engineer with Databricks V3 course

Hello, Since yesterday noon EST, the Data Engineering with Databricks V3 course is in maintenance mode. Can someone please help restore the access.Thank you,Bharani

  • 1702 Views
  • 1 replies
  • 0 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 0 kudos

I believe you will have access now.

  • 0 kudos
ThomasVanBilsen
by New Contributor III
  • 2048 Views
  • 1 replies
  • 0 kudos

Resolved! Lineage graph now working.

Hey everyone,I've run the following code successfully:CREATE CATALOG IF NOT EXISTS lineage_data;CREATE SCHEMA IF NOT EXISTS lineage_data.lineagedemo;CREATE TABLE IF NOT EXISTS  lineage_data.lineagedemo.menu (    recipe_id INT,    app string,    main ...

  • 2048 Views
  • 1 replies
  • 0 kudos
Latest Reply
youssefmrini
Databricks Employee
  • 0 kudos

I recommend you open a ticket to the support.

  • 0 kudos
lazcanja
by New Contributor
  • 1620 Views
  • 0 replies
  • 0 kudos

How to update table location with wasb to abfss

I created a table including location such as: wasb://<container>@<storageaccount>.blob.core.windows.net/foldername We have updated access to storage accounts to use abfssI am trying to execute the following command: alter table mydatabase.mytable set...

  • 1620 Views
  • 0 replies
  • 0 kudos
SamCallister
by New Contributor II
  • 17312 Views
  • 8 replies
  • 3 kudos

Dynamic Partition Overwrite for Delta Tables

Spark supports dynamic partition overwrite for parquet tables by setting the config: spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic") before writing to a partitioned table. With delta tables is appears you need to manually specif...

  • 17312 Views
  • 8 replies
  • 3 kudos
Latest Reply
alijen
New Contributor II
  • 3 kudos

@SamCallister wrote: Spark supports dynamic partition overwrite for parquet tables by setting the config:spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")before writing to a partitioned table. With delta tables is appears you need ...

  • 3 kudos
7 More Replies
AleksandraFrolo
by New Contributor III
  • 953 Views
  • 0 replies
  • 0 kudos

Web scraping with Databricks

Hello,What is the easiest way to do web scraping in Databricks? Let's imagine that from this link: http://automated.pythonanywhere.com , I need to grab this element "/html/body/div[1]/div/h1[1]" and return a text, how can I do it? Can somebody write ...

  • 953 Views
  • 0 replies
  • 0 kudos
Erik_L
by Contributor II
  • 1699 Views
  • 1 replies
  • 1 kudos

Structured Streaming from TimescaleDB?

I realize that the best practice would be to integrate our service with Kafka as a streaming source for Databricks, but given that the service already stores data into TimescaleDB, how can I stream data from TimescaleDB into DBX? Debezium doesn't wor...

  • 1699 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

https://github.com/noctarius/timescaledb-event-streamer/Might help.

  • 1 kudos
DatabricksPract
by New Contributor II
  • 7712 Views
  • 2 replies
  • 2 kudos

Resolved! Get metadata of tables in hive metastore

Hi team,I have a requirement to get the metadata of tables available in databricks hive metastore.Is there any way to get the metadata of all the tables inspite of looping through tables using Describe table_name.As hive metastore doesnot support inf...

  • 7712 Views
  • 2 replies
  • 2 kudos
Latest Reply
DatabricksPract
New Contributor II
  • 2 kudos

@Tharun-Kumar - Thanks for your quick reply, it worked.

  • 2 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels