We have SQL database. Database is designed in star schema. We are migrating data from SQL to databricks. There are cubes designed using SSAS. These cubes are used for end users in excel for analysis purpose. We are now looking for solution for:1) Can...
Hi @shraddharane ,
1) Can cubes be migrated?
No, SSAS cubes cannot be directly migrated to Databricks. Databricks do not support the concept of multidimensional cubes like SSAS. Databricks is a Lakehouse architecture built on the foundation of Delta ...
Hi,I would like to know what you think about using the Delta Live Tables when the source for this pipeline is not incremental. What I mean by that is suppose that the data provider creates for me a new folder with files each time it has update to the...
When dealing with B2B data building, the process of updating and managing your data can present unique challenges. Since your data updates involve new folders with files and you need to process the entire new folder, the concept of incremental proces...
Hi @GNarain,
Here is an example of the API call:
Could you try and let us know?
POST /api/2.0/workspace/update{ "workspaceAccessControlEnabled": true}
This API call will enable table access control for your workspace. You can make this API call u...
Could you please provide me an idea how I can start reprocessing of my data? Imagine I have a folder in adls gen2 "/test" with binaryFiles. They already processed with current pipeline. I want to reprocess the data + continue receive new data. What t...
@Eldar_Dragomir In order to re-process the data, we have to change the checkpoint directory. This will start processing the files from the beginning. You can use cloudFiles.maxFilesPerTrigger, to limit the number of files getting processed per micro-...
I am trying to run a notebook which reads some of its variables from and external notebook (I used %run command for that purpose), but it keeps giving me error that these variables are not defined. These sequences of notebooks run perfectly fine on a...
I have to 2 questions about elapsed job runtimes. The same Scoring notebook is run 3 times as 3 Jobs. The jobs are identical, same PetaStorm code, CPU cluster config ( not Spot cluster) and data but have varying elapsed runtimes. Elapsed runtimes...
Hi @NathanLaw, Could you please confirm, if you have set any parameters for the best model? Is this stop after running some epochs if there is no improvement in the model performance?
Hi all,We are planning to develop an Autoloader based DLT Pipeline that needs to beDeployable via a CI/CD PipelineObservableCan somebody please point me to source-code that we can start with a firm foundation instead of falling into a newbie-pattern ...
Hi @Sanjay_AMP Delta Live Tables and AutoLoader can be used together to incrementally ingest data from cloud object storage.• Python code example: - Define a table called "customers" that reads data from a CSV file in cloud object storage. - Define a...
Hi databricks/spark experts!I have a piece on pandas-based 3rd party code that I need to execute as a part of a bigger spark pipeline. By nature, pandas-based code is executed on driver node. I ran into out of memory problems and started exploring th...
Hi @wojciech_jakubo 1. JVM memory will not be utilized for python related activities. 2. In the image we could only see the storage memory. We also have execution memory which would also be the same. Hence I came up with the executor memory to be of ...
Hello,I'm facing a problem with big tarballs to decompress and to fit in memory I had to limit Spark processing too many files at the same time so I changed the following property on my 8 cores VMs cluster:spark.task.cpus 4 This setting is the thresh...
Hi @Thor,
Spark does not offer the capability to dynamically modify configuration settings, such as spark.task.cpus, for individual stages or transformations while the application is running. Once a configuration property is set for a Spark applicati...
Hello, Since yesterday noon EST, the Data Engineering with Databricks V3 course is in maintenance mode. Can someone please help restore the access.Thank you,Bharani
Hey everyone,I've run the following code successfully:CREATE CATALOG IF NOT EXISTS lineage_data;CREATE SCHEMA IF NOT EXISTS lineage_data.lineagedemo;CREATE TABLE IF NOT EXISTS lineage_data.lineagedemo.menu ( recipe_id INT, app string, main ...
I created a table including location such as: wasb://<container>@<storageaccount>.blob.core.windows.net/foldername We have updated access to storage accounts to use abfssI am trying to execute the following command: alter table mydatabase.mytable set...
Hi @lazcanja, The error message indicates an issue with the configuration value for the storage account key. The error might be due to an incorrect or invalid key.
Given the information provided, you have correctly changed the configuration from spar...
Spark supports dynamic partition overwrite for parquet tables by setting the config:
spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")
before writing to a partitioned table. With delta tables is appears you need to manually specif...
@SamCallister wrote: Spark supports dynamic partition overwrite for parquet tables by setting the config:spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")before writing to a partitioned table. With delta tables is appears you need ...
Hello,What is the easiest way to do web scraping in Databricks? Let's imagine that from this link: http://automated.pythonanywhere.com , I need to grab this element "/html/body/div[1]/div/h1[1]" and return a text, how can I do it? Can somebody write ...
Hi team,I have a requirement to get the metadata of tables available in databricks hive metastore.Is there any way to get the metadata of all the tables inspite of looping through tables using Describe table_name.As hive metastore doesnot support inf...